Visual Odometry


Projective ICP Visual Odometry


ICP gif

Description


This project implements a projective ICP based visual odometry, to estimate both the trajectory of a robot and the 3D map of the visual features. The robot is equipped with a monocular camera with known intrinsic and extrinsic parameters.

Data

Set of 120 measurements where each measurement contains a variable set of pairs (image_point, appearance), where:

Algorithm

Initialization step

The initial pose (the one relative to the measurement 0) is set to the identity. Then the estimate of the first pose is computed with the following steps:

  1. Match the image points of the measurement 0 with the image points of the measurement 1 using the appearance;
  2. Estimate the essential matrix relative to the first two camera poses with the set of matched image points;
  3. Recover the relative position between the two camera poses using the essential matrix and the set of matched image points.

These steps provide a first estimate of the pose of the camera.

Then a first estimate of the map is computed by triangulating the image points of the first two measurements, using the estimated pose.

Update step

The update step takes one measurement at a time and performs the projective ICP algorithm between the current measurement and the current estimated 3D map, to recover the relative pose of the camera with these steps:

  1. Match the image points of the current measurement with the 3D points of the map using the appearance;
  2. Perform one step of the projective ICP algorithm;
  3. Repeat until the maximum number of iterations is reached or the stopping criterion is met.

Using the new estimated pose of the camera from the projective ICP, a new set of 3D points are triangulated and added to the estimated map.

The update step is repeated for each measurement.

Projective ICP

A single step of the projective ICP is divided into two parts:

  1. Linearize the problem;
  2. Resolve the linearized problem with a least square approach.

The linearization part takes as input the reference image points (from the measurement) and the current world points from the estimated map, already matched, and the current pose of the camera with respect to the world \( {}^wT_{c_0}\). Then, it calculates the matrix \(H\) and the vector \(b\) by computing for each pair of matched points \( ({}^rp, {}^wp) \) (reference image points, estimated world point) the error \(e\) and the Jacobian \(J\) in this way:

\[ \begin{align} \text{World point in camera coordinates (hom): } &{}^{c_0}p_{hom} = inv({}^wT_{c_0}) {}^wp_{hom}\\ \text{World point in camera coordinates: } &{}^{c_0}p_{norm} = {}^{c_0}p_{hom}[:3]/{}^{c_0}p_{hom}[3]\\ \text{World point on image plane (hom): } &{}^{c_0}p_{cam}=K {}^{c_0}p_{norm}\\ \end{align} \]

\[ \begin{align} e = {}^rp-{}^{c_0}p_{cam} \end{align} \]

\[ \begin{align} J &= J_{proj}({}^{c_0}p_{cam}) K J_{icp}\\ J_{icp} &= [I_{3\times3} | ⌊-{}^{c_0}p_{norm}⌋_\times] \\ J_{proj}({}^{c_0}p_{cam}) &= \begin{bmatrix}\frac{1}{z} & 0 & -\frac{x}{z^2} \\ 0 & \frac{1}{z} & \frac{y}{z^2} \end{bmatrix} \end{align} \]

Then the error is used to compute \(chi = e^T e\), and:

Then, if the point is an inlier, the error and Jacobian are used to update \(H\) and \(b\) as:

\[ \begin{align} H &= H + J^T J \\ b &= b + J^T e \end{align} \]

Finally, a 6D vector describing the relative pose of the camera with respect to the previous pose is calculated by solving:

\[ \begin{align} dx &\leftarrow slove_{lstq}(H dx = -b) \\ {}^{c_0}T_{c_1} &= v2T(dx) \\ {}^wT_{c_1} &= {}^wT_{c_0} {}^{c_0}T_{c_1} \end{align} \]

Results


The following results has been obtained using the following parameters:

Moreover, techniques to adapt dinamically the kernel threshold and the dumping factor have been implemented:

The number of iterations has been set to 250, but the algorithm stops when the error becomes less than \(10^{-2}\) or the it is stuck for 100 iterations. Then, regardingless the last result obtained from the algorithm, the one with the lowest error is selected to compute the new transformation.

Final results

Visual results

Numerical results

Rotation Errors Translation Errors Translation Errors Ratio
Max rotation error 0.313 [rad] Max translation error 0.343 [m] Max translation error ratio 5.987
Min rotation error 0.000 [rad] Min translation error 0.002 [m] Min translation error ratio 4.962
Mean rotation error 0.074 [rad] Mean translation error 0.135 [m] Mean translation error ratio 4.962
Map Results
Number of points estimated map 395
RMSE estimated map 0.194 [m]
Scale 0.202

Errors

\[ \begin{align} T_{rel}^{est} &= {}^{r_i}T_{r_{i+1}}^{est} = inv({}^wT_{r_i}^{est}) {}^wT_{r_{i+1}}^{est} \\ T_{rel}^{gt} &= {}^{r_i}T_{r_{i+1}}^{gt} = inv({}^wT_{r_i}^{gt}) {}^wT_{r_{i+1}}^{gt} \\ T_{error} &= {}^{r_{i+1}^{est}}T_{r_{i+1}^{gt}}^{error} = inv(T_{rel}^{est}) T_{rel}^{gt} \\ R_{error} &= T_{error}[0:3,0:3] \\ t^{est}_{r_i} &= {}^wT_{r_i}^{est}[0:3,3] \\ t^{gt}_{r_i} &= {}^wT_{r_i}^{gt}[0:3,3] \\ \end{align} \] \[ \begin{align} rot\_error \space [rad] &= arccos((trace(R_{error})-1)/2) \\ translation\_ratio &= norm(t^{gt}_{r_i})/norm(t^{est}_{r_i}) \\ translation\_error \space [m] &= norm(translation\_ratio \cdot t^{est}_{r_i}-t^{gt}_{r_i}) \\ \end{align} \]

Frames results