Visual Odometry

Projective ICP Visual Odometry

ICP gif

Description

This project implements a projective ICP based visual odometry, to estimate both the trajectory of a robot and the 3D map of the visual features. The robot is equipped with a monocular camera with known intrinsic and extrinsic parameters.

Data

Set of 120 measurements where each measurement contains a variable set of pairs (image_point, appearance), where:

image_point: \([r,c]\) 1x2 vector.
appearance: \([f_1, ..., f_{10}]\) 1x10 vector (feature descriptor).

Algorithm

Initialization step

The initial pose (the one relative to the measurement 0) is set to the identity. Then the estimate of the first pose is computed with the following steps:

Match the image points of the measurement 0 with the image points of the measurement 1 using the appearance;
Estimate the essential matrix relative to the first two camera poses with the set of matched image points;
Recover the relative position between the two camera poses using the essential matrix and the set of matched image points.

These steps provide a first estimate of the pose of the camera.

Then a first estimate of the map is computed by triangulating the image points of the first two measurements, using the estimated pose.

Update step

The update step takes one measurement at a time and performs the projective ICP algorithm between the current measurement and the current estimated 3D map, to recover the relative pose of the camera with these steps:

Match the image points of the current measurement with the 3D points of the map using the appearance;
Perform one step of the projective ICP algorithm;
Repeat until the maximum number of iterations is reached or the stopping criterion is met.

Using the new estimated pose of the camera from the projective ICP, a new set of 3D points are triangulated and added to the estimated map.

The update step is repeated for each measurement.

Projective ICP

A single step of the projective ICP is divided into two parts:

Linearize the problem;
Resolve the linearized problem with a least square approach.

The linearization part takes as input the reference image points (from the measurement) and the current world points from the estimated map, already matched, and the current pose of the camera with respect to the world \( {}^wT_{c_0}\). Then, it calculates the matrix \(H\) and the vector \(b\) by computing for each pair of matched points \( ({}^rp, {}^wp) \) (reference image points, estimated world point) the error \(e\) and the Jacobian \(J\) in this way:

\[ \begin{align} \text{World point in camera coordinates (hom): } &{}^{c_0}p_{hom} = inv({}^wT_{c_0}) {}^wp_{hom}\\ \text{World point in camera coordinates: } &{}^{c_0}p_{norm} = {}^{c_0}p_{hom}[:3]/{}^{c_0}p_{hom}[3]\\ \text{World point on image plane (hom): } &{}^{c_0}p_{cam}=K {}^{c_0}p_{norm}\\ \end{align} \]

\[ \begin{align} e = {}^rp-{}^{c_0}p_{cam} \end{align} \]

\[ \begin{align} J &= J_{proj}({}^{c_0}p_{cam}) K J_{icp}\\ J_{icp} &= [I_{3\times3} | ⌊-{}^{c_0}p_{norm}⌋_\times] \\ J_{proj}({}^{c_0}p_{cam}) &= \begin{bmatrix}\frac{1}{z} & 0 & -\frac{x}{z^2} \\ 0 & \frac{1}{z} & \frac{y}{z^2} \end{bmatrix} \end{align} \]

Then the error is used to compute \(chi = e^T e\), and:

if \(chi \le kernel\_threshold\) the point is an inlier,
otherwise, the point in an outlier.

Then, if the point is an inlier, the error and Jacobian are used to update \(H\) and \(b\) as:

\[ \begin{align} H &= H + J^T J \\ b &= b + J^T e \end{align} \]

Finally, a 6D vector describing the relative pose of the camera with respect to the previous pose is calculated by solving:

\[ \begin{align} dx &\leftarrow slove_{lstq}(H dx = -b) \\ {}^{c_0}T_{c_1} &= v2T(dx) \\ {}^wT_{c_1} &= {}^wT_{c_0} {}^{c_0}T_{c_1} \end{align} \]

Results

The following results has been obtained using the following parameters:

Kernel threshold:
- Max: \(2000\)
- Min: \(10\)
- Base: \(1000\)
Dumping factor:
- Max: \(64\)
- Min: \(1\)
- Base: \(64\)
Min inliers: 10
Max iterations: 250

Moreover, techniques to adapt dinamically the kernel threshold and the dumping factor have been implemented:

Kernel threshold: The kernel threshold is adapted basing on the number of inliers found in the projective ICP step;
Dumping factor: The dumping factor is adapted by computing how much the error is flickering or in a stuck situation. If the error is flickering the dumping factor is increased, otherwise, if the error is stuck the dumping factor is decreased. The main idea behind this approach is to stabilize how much as possible the error during the optimization and try to avoid local minima. The flickering and stuck situation are detected by using a ring buffer of the last 10 errors slope and computing the mean and the standard deviation of the buffer.

The number of iterations has been set to 250, but the algorithm stops when the error becomes less than \(10^{-2}\) or the it is stuck for 100 iterations. Then, regardingless the last result obtained from the algorithm, the one with the lowest error is selected to compute the new transformation.

Final results

Visual results

Numerical results

Rotation Errors		Translation Errors		Translation Errors Ratio
Max rotation error	0.313 [rad]	Max translation error	0.343 [m]	Max translation error ratio	5.987
Min rotation error	0.000 [rad]	Min translation error	0.002 [m]	Min translation error ratio	4.962
Mean rotation error	0.074 [rad]	Mean translation error	0.135 [m]	Mean translation error ratio	4.962

Map Results
Number of points estimated map	395
RMSE estimated map	0.194 [m]
Scale	0.202

Errors

\[ \begin{align} T_{rel}^{est} &= {}^{r_i}T_{r_{i+1}}^{est} = inv({}^wT_{r_i}^{est}) {}^wT_{r_{i+1}}^{est} \\ T_{rel}^{gt} &= {}^{r_i}T_{r_{i+1}}^{gt} = inv({}^wT_{r_i}^{gt}) {}^wT_{r_{i+1}}^{gt} \\ T_{error} &= {}^{r_{i+1}^{est}}T_{r_{i+1}^{gt}}^{error} = inv(T_{rel}^{est}) T_{rel}^{gt} \\ R_{error} &= T_{error}[0:3,0:3] \\ t^{est}_{r_i} &= {}^wT_{r_i}^{est}[0:3,3] \\ t^{gt}_{r_i} &= {}^wT_{r_i}^{gt}[0:3,3] \\ \end{align} \] \[ \begin{align} rot\_error \space [rad] &= arccos((trace(R_{error})-1)/2) \\ translation\_ratio &= norm(t^{gt}_{r_i})/norm(t^{est}_{r_i}) \\ translation\_error \space [m] &= norm(translation\_ratio \cdot t^{est}_{r_i}-t^{gt}_{r_i}) \\ \end{align} \]