Ben's Blog


Affine Transformations

Some notes on affine transformation conventions.

Mar 26, 2024

A perpetual pain point for people working on computer vision is dealing with affine transformations. Usually you have a diagram that looks something like this:

KITTI dataset coordinate frames

Each of the coordinate frames can be represented as a 4x4 matrix in Numpy. For the three sensors in the image above we might have something like this:

T_world_to_lidar # 4x4
T_world_to_camera # 4x4
T_world_to_imu # 4x4

Suppose we have a point p in the world frame. We can transform it to one of the other frames using matrix multiplication:

p = np.array([x, y, z, 1])
p_lidar = T_world_to_lidar @ p  # 4x4 @ 4x1 = 4x1

Here’s a toy example in 2D to illustrate the concept:

p_a_in_world = np.array([1, 1])
p_b_in_world = np.array([2, -1])

T_world_to_lidar_a = np.array([
    [1, 0, 1],
    [0, 1, 2],
    [0, 0, 1],
])

T_world_to_lidar_b = np.array([
    [-1, 0, 2],
    [0, -1, 1],
    [0, 0, 1],
])

p_a_in_lidar_a = T_world_to_lidar_a @ np.append(p_a_in_world, 1)   # [2, 3]
p_b_in_lidar_a = T_world_to_lidar_a @ np.append(p_b_in_world, 1)   # [3, 1]
p_a_in_lidar_b = T_world_to_lidar_b @ np.append(p_a_in_world, 1)   # [1, 0]
p_b_in_lidar_b = T_world_to_lidar_b @ np.append(p_b_in_world, 1)   # [0, 2]

p_o_in_lidar = np.array([0, 0])
p_o_a_in_world = np.linalg.inv(T_world_to_lidar_a) @ np.append(p_o_in_lidar, 1)  # [-1, -2]
p_o_b_in_world = np.linalg.inv(T_world_to_lidar_b) @ np.append(p_o_in_lidar, 1)  # [2, 1]

In words:

Here’s how this looks visually, where the axis has the origin at the world frame:

2D affine transformation