Direct Visual Odometry

3 minute read

Published:

There are different ways to develop a visual odometry pipeline, one way is to construct a sparse VO, which is mainly composed by the following ingredients:

  1. Compute keypoints descriptors in your cameras
  2. Match these keypoints across your cameras
  3. Optimize for the position of the cameras (coordinates in SE(3)) and your scene structure (points in R3) (For simplcity, in this tutorial we are just going to optimize for the camera positions in SE(3)).

Normally, point #4 is accomplished by minimizing the reprojection error of your matched keypoints.

minTjSE(3)xj(Tj,K)x022

Where Tj is the camera pose (extrinsics) and K the camera intrinsics. In constrast, photometric bundle adjustment minimizes the photometric error which is a pixel wise metric of intensity.

minTjSE(3)Ij(Tj,K,x)I0(x)2F

Where I is the image intensity function and x are the pixel locations where we want to compute this photometric error. ||||F Is the Frobenius norm.

The thing inside the norm is called the residual and to compute this first we need to project a set of keypoints in the base camera into R3 and for this we assume we already know the depth value z of every point. We can do this easily in numpy or pytorch. Here is some pseudocode.

kp_base = sample_random_keypoints(N_kp)
depths = sample_image(depth_image)

points3d_base = get_points3d(intrinsics, kp_base, depths)

After this we need to have a guess of the camera transformation matrices Tj to project the points into the other camera. For this we assume the identity matrix for all of them.

base_to_j_se3_init = torch.eye((4, 4))

# Project points into jth camera
kp_j, points3d_j = project_points(points3d_base, instrinsics, base_to_j_se3_init)

Once the keypoints are computed we can compute the photometric error, but first we need to get rid of the points that can hurt of optimization problem, the ones that are outside of the image and the points with negative depth. To get the photometric error we need to sample our intensity maps using our keypoints (x,y) positions.

# Sample intensity maps
F_base = sample_image(intensity_base, kp_base)
F_j = sample_image(intensity_j, kp_j)

residuals = F_j - F_base

# Zero out invalid points
invalid_mask = ~(get_valid_point_mask(points3d_j, kp_j, intensity_base.shape))
residuals[invalid_mask.expand(N_kp, 1)] = 1e-3

Solving the optimization problem

To photometric error objective is a non-linear least square problem that is commonly solved with the Levenberg-Marquardt algorithm. Which is simply an extension of the Gauss-Newton method. We can re-formulate the photometric objective above with the following more general objective function.

minx12E(x)2

where xRn and E=[E1,,Em]T, a nonlinear function of m outputs. If we linearize E(x) at a fixed point x using a taylor approximation we arrive to the following expression E(x+Δx)E(x)+J(x)Δx

Using this approximation instead of the original objective yields a linear least squares problem of the form. A regularization term is also added to control the size of the step size Δx.

minΔx12E(x)+J(x)Δx2+λD(x)Δx2

Where D(x) is the square root of the diagonal of the Hessian matrix H(x)=J(x)TJ(x). This optimization problem can be solved using the Schur Complement trick (more_info), but it also has a simple closed form solution, which involves solving a system of linear equations

(H(x)+λD(x))Δx=J(x)TE(x)
def lm_step(x, jacobian, residuals, lamb):
  hessian = jacobian @ jacobian 
  gradient = jacobian.T @ residuals

  reg = lamb * (hessian.diagonal() + 1e-4).diag_embed()
  delta_x = solve(hessian + reg, -gradient)
  return x + delta_x

Computing the jacobians