Direct Visual Odometry

3 minute read

Published:

There are different ways to develop a visual odometry pipeline, one way is to construct a sparse VO, which is mainly composed by the following ingredients:

  1. Compute keypoints descriptors in your cameras
  2. Match these keypoints across your cameras
  3. Optimize for the position of the cameras (coordinates in $SE(3)$) and your scene structure (points in $\mathbb{R}^3$) (For simplcity, in this tutorial we are just going to optimize for the camera positions in $SE(3)$).

Normally, point #4 is accomplished by minimizing the reprojection error of your matched keypoints.

\[\min \displaystyle{\sum_{T_j \in SE(3)}} \|x_j(T_j, K) - x_0\|_{2}^2\]

Where $T_j$ is the camera pose (extrinsics) and $K$ the camera intrinsics. In constrast, photometric bundle adjustment minimizes the photometric error which is a pixel wise metric of intensity.

\[\min \displaystyle{\sum_{T_j \in SE(3)}} \|I_j(T_j, K, x) - I_0(x)\|_{F}^2\]

Where $I$ is the image intensity function and $x$ are the pixel locations where we want to compute this photometric error. $||\cdot||_F$ Is the Frobenius norm.

The thing inside the norm is called the residual and to compute this first we need to project a set of keypoints in the base camera into $\mathbb{R}^3$ and for this we assume we already know the depth value $z$ of every point. We can do this easily in numpy or pytorch. Here is some pseudocode.

kp_base = sample_random_keypoints(N_kp)
depths = sample_image(depth_image)

points3d_base = get_points3d(intrinsics, kp_base, depths)

After this we need to have a guess of the camera transformation matrices $T_j$ to project the points into the other camera. For this we assume the identity matrix for all of them.

base_to_j_se3_init = torch.eye((4, 4))

# Project points into jth camera
kp_j, points3d_j = project_points(points3d_base, instrinsics, base_to_j_se3_init)

Once the keypoints are computed we can compute the photometric error, but first we need to get rid of the points that can hurt of optimization problem, the ones that are outside of the image and the points with negative depth. To get the photometric error we need to sample our intensity maps using our keypoints $(x,y)$ positions.

# Sample intensity maps
F_base = sample_image(intensity_base, kp_base)
F_j = sample_image(intensity_j, kp_j)

residuals = F_j - F_base

# Zero out invalid points
invalid_mask = ~(get_valid_point_mask(points3d_j, kp_j, intensity_base.shape))
residuals[invalid_mask.expand(N_kp, 1)] = 1e-3

Solving the optimization problem

To photometric error objective is a non-linear least square problem that is commonly solved with the Levenberg-Marquardt algorithm. Which is simply an extension of the Gauss-Newton method. We can re-formulate the photometric objective above with the following more general objective function.

\[\displaystyle{\min_{x}} \frac{1}{2} \|E(\mathcal{x})\|^2\]

where $x \in \mathbb{R}^n$ and $E = [E_1, \dots ,E_m]^T$, a nonlinear function of $m$ outputs. If we linearize $E(x)$ at a fixed point $x$ using a taylor approximation we arrive to the following expression \(E(x + \Delta x) \approx E(x) + J(x)\Delta x\)

Using this approximation instead of the original objective yields a linear least squares problem of the form. A regularization term is also added to control the size of the step size $\Delta x$.

\[\displaystyle{\min_{\Delta x}} \frac{1}{2} \|E(\mathcal{x}) + J(x)\Delta x\|^2 + \lambda \| D(x) \Delta x \|^2\]

Where $D(x)$ is the square root of the diagonal of the Hessian matrix $H(x) = J(x)^T J(x)$. This optimization problem can be solved using the Schur Complement trick (more_info), but it also has a simple closed form solution, which involves solving a system of linear equations

\[\left(H(x) + \lambda D(x)\right) \Delta x = - J(x)^T E(x)\]
def lm_step(x, jacobian, residuals, lamb):
  hessian = jacobian @ jacobian 
  gradient = jacobian.T @ residuals

  reg = lamb * (hessian.diagonal() + 1e-4).diag_embed()
  delta_x = solve(hessian + reg, -gradient)
  return x + delta_x

Computing the jacobians