The Camera

Computer Vision CMP-6035B

Dr. David Greenwood

Spring 2022

Camera Model
Intrinsic and Extrinsic Parameters
Direct Linear Transformation

The Camera

“Sallie Gardner,” owned by Leland Stanford; ridden by G. Domm, running at a 1:40 gait over the Palo Alto track, 19th June 1878.

The Camera

Cameras measure light intensities.

the sensor counts photons arriving at the pixel
each pixel corresponds to a direction in world space

The Camera

Cameras can also be seen as direction measurement devices.

we are often interested in geometric properties of a scene
an object reflects light to a specific location on the sensor
Which 3D point is mapped to which pixel?

The Camera

How do we get the point observations?

keypoints and features
SIFT, ORB, etc.
locally distinct features

The Camera

Features identify points mapped from the 3D world to the 2D image.

Pinhole Camera Model

\(f\) : effective focal length
\(\textbf{r}_{o} = (x_o, y_o, z_o)\)
\(\textbf{r}_{i} = (x_i, y_i, f)\)

Pinhole Camera Model

Using similar triangles, we get the equations of perspective projection.

\[ \frac{\textbf{r}_{i}}{f} = \frac{\textbf{r}_{o}}{z_o} \quad \Rightarrow \quad \frac{x_i}{f} = \frac{x_o}{z_o}, ~\frac{y_i}{f} = \frac{y_o}{z_o} \]

Camera Parameters

Describe how a world point is mapped to a pixel coordinate.

Camera Parameters

Describe how a world point is mapped to a pixel coordinate.

Camera Parameters

We will describe this mapping in homogeneous coordinates.

\[ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = P \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \]

Aside: Homogeneous Coordinates

\[ \begin{bmatrix} u \\ v \\ w \end{bmatrix} \Rightarrow \begin{bmatrix} u/w \\ v/w \\ 1 \end{bmatrix} \Rightarrow \begin{bmatrix} u/w \\ v/w \end{bmatrix} \Rightarrow \begin{bmatrix} x \\ y \end{bmatrix} \]

Coordinate Systems

We have to transform via a number of coordinate systems:

The world coordinate system
The camera coordinate system
The image coordinate system
The pixel coordinate system

World to Pixels

World to Pixels

World to Pixels

World to Pixels

Camera Parameters

How do we work with these parameters?

extrinsic parameters: the pose of the camera in the world
intrinsic parameters: the properties of the camera

Extrinsic Parameters

The pose of the camera.

Extrinsic Parameters

Describe the pose of the camera in the world.
That is, the position and heading of the camera.
Invertible transformation.

How many parameters do we need?

3 parameters for the position
3 parameters for the heading
There are 6 extrinsic parameters.

Extrinsic Parameters

Point in world coordinates:

\[ \textbf{X}_p = [ X_p, Y_p, Z_p ]^T \]

Origin of camera in world coordinates:

\[ \textbf{X}_o = [ X_o, Y_o, Z_o ]^T \]

Transformation

Translation between origin of world and camera coordinates is:

\[ \textbf{X}_o = [ X_o, Y_o, Z_o ]^T \]

Rotation \(R\) from world to camera coordinates system is:

\[ {}^{k}\textbf{X}_p = R(\textbf{X}_p - \textbf{X}_o) \]

Homogeneous Coordinates

\[ \begin{aligned} \begin{bmatrix} {}^{k}\textbf{X}_p \\ 1 \end{bmatrix} &= \begin{bmatrix} R & \textbf{0} \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} I_3 & -\textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \textbf{X}_p \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \textbf{X}_p \\ 1 \end{bmatrix} \end{aligned} \]

or:

\[ {}^{k}\textbf{X}_p = {}^{k}H \textbf{X}_p, \quad \text{where} \quad {}^{k}H = \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \]

Intrinsic Parameters

Projecting points from the camera to the sensor.

Intrinsic Parameters

projection from camera coordinates to sensor coordinates
central projection is not invertible
image plane to sensor is invertible
linear deviations are invertible

Recall for our pinhole model:

\[ \begin{aligned} {}^{c}x_p &= c \frac{{}^{k}X_p}{{}^{k}Z_p} \\ {}^{c}y_p &= c \frac{{}^{k}Y_p}{{}^{k}Z_p} \end{aligned} \]

where \(c\) is the focal length, or camera constant.

Homogeneous Coordinates

\[ \begin{bmatrix} U \\ V \\ W \\ T \end{bmatrix} = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & c & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} {}^{k}X_p \\ {}^{k}Y_p \\ {}^{k}Z_p \\ 1 \end{bmatrix} \]

Drop the 3rd row:

\[ \begin{bmatrix} {}^{c}x_p \\ {}^{c}y_p \\ 1 \end{bmatrix} = \begin{bmatrix} {}^{c}u_p \\ {}^{c}v_p \\ {}^{c}w_p \end{bmatrix} = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} {}^{k}X_p \\ {}^{k}Y_p \\ {}^{k}Z_p \\ 1 \end{bmatrix} \]

Ideal Camera

The mapping for an ideal camera is:

\[ {}^{c}x = {}^{c}P X \]

with:

\[ {}^{c}P = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \]

Calibration Matrix

We can now define the calibration matrix for an ideal camera.

\[ {}^{c}K = \begin{bmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

The mapping of a point in the world to the image plane is:

\[ {}^{c}P = {}^{c}K R [I_3 | - \textbf{X}_o] \]

Linear Errors

The next step is mapping from the image plane to the sensor.

Location of principal point in sensor coordinates.
Scale difference in x and y, according to chip design.
Shear compensation.

Location of Principal Point

Origin of sensor space is not at the principal point:

\[ {}^{s}H_{c} = \begin{bmatrix} 1 & 0 & x_H \\ 0 & 1 & y_H \\ 0 & 0 & 1 \end{bmatrix} \]

Compensation is a translation.

Scale and Shear

Scale difference \(m\) in x and y.
Sheer compensation \(s\).

We need to add 4 additional parameters to our calibration matrix:

\[ {}^{s}H_{c} = \begin{bmatrix} 1 & s & x_H \\ 0 & 1 + m & y_H \\ 0 & 0 & 1 \end{bmatrix} \]

Calibration Matrix

Normally, we combine these compensations with the ideal calibration matrix:

\[ \begin{aligned} K &= \begin{bmatrix} 1 & s & x_H \\ 0 & 1 + m & y_H \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} c & s & x_H \\ 0 & c(1 + m) & y_H \\ 0 & 0 & 1 \end{bmatrix} \end{aligned} \]

Calibration Matrix

\[ K = \begin{bmatrix} c & s & x_H \\ 0 & c(1 + m) & y_H \\ 0 & 0 & 1 \end{bmatrix} \]

There are 5 intrinsic parameters:

camera constant \(c\)
scale difference \(m\)
principal point offset \(x_H\) and \(y_H\)
shear compensation \(s\)

Projection Matrix

Finally, we have the \(3 \times 4\) homogeneous projection matrix:

\[ P = K R [I_3 | - \textbf{X}_o] \]

It contains 11 parameters:

6 extrinsic parameters
5 intrinsic parameters

Direct Linear Transformation

Control Points

We have control points of known coordinates in the world.

We want to estimate the camera parameters, given these points.

Parameter Estimation

Goal: camera parameters, \(P\).
Given: control points in the world, \(X\).
Observed: coordinates \((x, y)\) in the image.

Mapping

Direct Linear Transformation (DLT) maps a point in the world to a point in the image.

\[ \begin{aligned} x &= K R [I_3 | - \textbf{X}_o] \textbf{X} \\ &= P \textbf{X} \end{aligned} \]

Camera Parameters

\[ x = K R [I_3 | - \textbf{X}_o] \textbf{X} = P \textbf{X} \]

Intrinsic parameters \(K\)
Extrinsic parameters \(\textbf{X}_o\) and \(R\).
Projection matrix \(P\) contains intrinsic and extrinsic parameters.

Direct Linear Transformation

Compute the 11 intrinsic and extrinsic parameters.

How many points are needed?

Homogeneous projection:

\[ \begin{bmatrix} u \\ v \\ w \end{bmatrix} = P \begin{bmatrix} U \\ V \\ W \\ T \end{bmatrix} \]

Normalised homogeneous projection:

\[ \begin{bmatrix} u/w \\ v/w \\ 1 \end{bmatrix} = P \begin{bmatrix} U/T \\ V/T \\ W/T \\ 1 \end{bmatrix} \]

Euclidean coordinates:

\[ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = P \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \]

We can expand the multiplication by \(P\) to get the following:

\[ \begin{aligned} x &= \frac{p_{11}X + p_{12}Y + p_{13}Z + p_{14}} {p_{31}X + p_{32}Y + p_{33}Z + p_{34}} \\[10pt] y &= \frac{p_{21}X + p_{22}Y + p_{23}Z + p_{24}} {p_{31}X + p_{32}Y + p_{33}Z + p_{34}} \end{aligned} \]

Each point gives two observation equations, one for each image coordinate.

How many points are needed?

Each point gives two observation equations, one for each image coordinate.

We need at least 6 points to estimate 11 parameters.

Rearrange the DLT Equation

\[ \textbf{x}_i = P \textbf{X}_i \]

\[ \textbf{x}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]

\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]

Define three vectors:

\[ A = \begin{bmatrix} p_{11} \\ p_{12} \\ p_{13} \\ p_{14} \end{bmatrix}, \quad B = \begin{bmatrix} p_{21} \\ p_{22} \\ p_{23} \\ p_{24} \end{bmatrix}, \quad C = \begin{bmatrix} p_{31} \\ p_{32} \\ p_{33} \\ p_{34} \end{bmatrix} \]

\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]

Rewrite the equation as:

\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} A^T \\ B^T \\ C^T \end{bmatrix} \textbf{X}_i \]

\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]

Rewrite the equation as:

\[ \begin{bmatrix} u_i \\ v_i \\ w_i \end{bmatrix} \quad = \quad \textbf{x}_i \quad = \quad P \textbf{X}_i \quad = \quad \begin{bmatrix} A^T \\ B^T \\ C^T \end{bmatrix} \textbf{X}_i \quad = \quad \begin{bmatrix} A^T X_i \\ B^T X_i \\ C^T X_i \end{bmatrix} \]

\[ \textbf{x}_i = \begin{bmatrix} x_i \\ y_i \\ 1 \end{bmatrix}, \quad \begin{bmatrix} u_i \\ v_i \\ w_i \end{bmatrix} = \begin{bmatrix} A^T X_i \\ B^T X_i \\ C^T X_i \end{bmatrix} \]

\[ x_i = \frac{u_i}{w_i} = \frac{A^T X_i}{C^T X_i}, \quad y_i = \frac{v_i}{w_i} = \frac{B^T X_i}{C^T X_i} \]

System of equations

\[ x_i = \frac{A^T X_i}{C^T X_i} \quad \Rightarrow \quad x_i C^T X_i - A^T X_i = 0 \]

\[ y_i = \frac{B^T X_i}{C^T X_i} \quad \Rightarrow \quad y_i C^T X_i - B^T X_i = 0 \]

Leading to a system of linear equations in \(A\), \(B\), and \(C\):

\[ \begin{aligned} - X_{i}^{T} A + x_i X_{i}^{T} C &= 0 \\ - X_{i}^{T} B + y_i X_{i}^{T} C &= 0 \end{aligned} \]

let:

\[ \textbf{p} = \begin{bmatrix} A \\ B \\ C \end{bmatrix} = vec(P^T) = \begin{bmatrix} p_{11} \\ p_{12} \\ p_{13} \\ p_{14} \\ p_{21} \\ p_{22} \\ p_{23} \\ p_{24} \\ p_{31} \\ p_{32} \\ p_{33} \\ p_{34} \end{bmatrix} \]

\[ \begin{aligned} - X_{i}^{T} A &\quad &+x_i X_{i}^{T} C &= 0 \\ \quad &- X_{i}^{T} B &+y_i X_{i}^{T} C &= 0 \end{aligned} \]

rewrite as:

\[ a_{x_i}^T \textbf{p} = 0, \quad a_{y_i}^T \textbf{p} = 0 \]

with:

\[ \begin{aligned} \textbf{p} &= vec(P^T) \\ a_{x_i}^T &= (-X_i, -Y_i, -Z_i, -1, 0, 0, 0, 0, x_i X_i, x_i Y_i, x_i Z_i, x_i) \\ a_{y_i}^T &= (0, 0, 0, 0, -X_i, -Y_i, -Z_i, -1, y_i X_i, y_i Y_i, y_i Z_i, y_i) \end{aligned} \]

for each point we have:

\[ a_{x_i}^T \textbf{p} = 0, \quad a_{y_i}^T \textbf{p} = 0 \]

stacking all the points vertically:

\[ \begin{bmatrix} a_{x_1}^T \\ a_{y_1}^T \\ a_{x_2}^T \\ a_{y_2}^T \\ \dots \\ a_{x_n}^T \\ a_{y_n}^T \\ \end{bmatrix} \textbf{p} = M \textbf{p} \overset{!}{=} 0 \]

Where \(M\) is a \(2n \times 12\) matrix.

Solving the Linear System

Solving a system of linear equations of the form \(Ax = 0\) is equivalent to finding the null space of \(A\).

Apply the Singular Value Decomposition (SVD) to solve \(M\textbf{p} = 0\).
SVD returns a matrix \(U\), \(S\), and \(V\) such that \(M = U S V^T\).
Choose \(\textbf{p}\) as the singular vector belonging to the singular value of \(0\).
Solution is the last column of \(V\).

Direct Linear Transformation

Does it always work?

Critical Surfaces

No solution if all points \(X_i\) are on a plane.

Decomposing the Projection Matrix

From \(P\) to \(K\), \(R\), \(\textbf{X}_o\)

Decomposing the Projection Matrix

We have \(P\), how do we obtain \(K, R, \textbf{X}_o\)?

Structure of \(P\):

\[ P = [K R | -K R\textbf{X}_o] = [H | \textbf{h}] \]

with:

\[ H = K R, \quad \textbf{h} = -KR\textbf{X}_o \]

Decomposing the Projection Matrix

\[ H = K R, \quad \textbf{h} = -KR\textbf{X}_o \]

We can obtain the projection centre by:

\[ \textbf{X}_o = -H^{-1} \textbf{h} \]

Decomposing the Projection Matrix

\[ H = K R \]

What do we know about these matrices?

Decomposing the Projection Matrix

Exploit the structure of \(H=K R\)

\(K\) is a triangular matrix
\(R\) is a rotation matrix

There is a standard method to decompose a matrix to a rotation and triangular matrix.

QR decomposition

Decomposing the Projection Matrix

We perform a QR decomposition on \(H^{-1}\), given the order of rotation and triangular matrices.

\[ H^{-1} = (K R)^{-1} = R^{-1}K^{-1} = R^{T}K^{-1} \]

Decomposing the Projection Matrix

The Matrix \(H = K R\) is homogeneous, therefore so is \(K\), so we must normalise.

\[ K \leftarrow \frac{1}{K_{33}} K \]

DLT recap

Build the matrix \(M\).
Solve using SVD; \(M = U \ S \ V^T\), solution is last column of \(V\).
If individual matrices are required, we can use QR decomposition.

Summary

Camera Model
Intrinsic and Extrinsic Parameters
Direct Linear Transformation

reading:

Forsyth, Ponce; Computer Vision: A modern approach. Section 1.3
Hartley, Zisserman; Multiple View Geometry in Computer Vision

The Camera

Contents

The Camera

The Camera

The Camera

The Camera

The Camera

Pinhole Camera Model

Pinhole Camera Model

Camera Parameters

Camera Parameters

Camera Parameters

Aside: Homogeneous Coordinates

Coordinate Systems

World to Pixels

World to Pixels

World to Pixels

World to Pixels

World to Pixels

Camera Parameters

Extrinsic Parameters

Extrinsic Parameters

Extrinsic Parameters

Transformation

Homogeneous Coordinates

Intrinsic Parameters

Intrinsic Parameters

Homogeneous Coordinates

Ideal Camera

Calibration Matrix

Linear Errors

Location of Principal Point

Scale and Shear

Calibration Matrix

Calibration Matrix

Projection Matrix

Direct Linear Transformation

Control Points

Parameter Estimation

Mapping

Camera Parameters

Direct Linear Transformation

How many points are needed?

How many points are needed?

Rearrange the DLT Equation

System of equations

Solving the Linear System

Direct Linear Transformation

Critical Surfaces

Decomposing the Projection Matrix

Decomposing the Projection Matrix

Decomposing the Projection Matrix

Decomposing the Projection Matrix

Decomposing the Projection Matrix

Decomposing the Projection Matrix

Decomposing the Projection Matrix

DLT recap

Summary