Computer Vision CMP-6035B
Dr. David Greenwood
Spring 2022
Cameras measure light intensities.
Cameras can also be seen as direction measurement devices.
How do we get the point observations?
Features identify points mapped from the 3D world to the 2D image.
Using similar triangles, we get the equations of perspective projection.
\[ \frac{\textbf{r}_{i}}{f} = \frac{\textbf{r}_{o}}{z_o} \quad \Rightarrow \quad \frac{x_i}{f} = \frac{x_o}{z_o}, ~\frac{y_i}{f} = \frac{y_o}{z_o} \]
Describe how a world point is mapped to a pixel coordinate.
Describe how a world point is mapped to a pixel coordinate.
We will describe this mapping in homogeneous coordinates.
\[ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = P \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \]
\[ \begin{bmatrix} u \\ v \\ w \end{bmatrix} \Rightarrow \begin{bmatrix} u/w \\ v/w \\ 1 \end{bmatrix} \Rightarrow \begin{bmatrix} u/w \\ v/w \end{bmatrix} \Rightarrow \begin{bmatrix} x \\ y \end{bmatrix} \]
We have to transform via a number of coordinate systems:
How do we work with these parameters?
The pose of the camera.
How many parameters do we need?
Point in world coordinates:
\[ \textbf{X}_p = [ X_p, Y_p, Z_p ]^T \]
Origin of camera in world coordinates:
\[ \textbf{X}_o = [ X_o, Y_o, Z_o ]^T \]
Translation between origin of world and camera coordinates is:
\[ \textbf{X}_o = [ X_o, Y_o, Z_o ]^T \]
Rotation \(R\) from world to camera coordinates system is:
\[ {}^{k}\textbf{X}_p = R(\textbf{X}_p - \textbf{X}_o) \]
\[ \begin{aligned} \begin{bmatrix} {}^{k}\textbf{X}_p \\ 1 \end{bmatrix} &= \begin{bmatrix} R & \textbf{0} \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} I_3 & -\textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \textbf{X}_p \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \textbf{X}_p \\ 1 \end{bmatrix} \end{aligned} \]
or:
\[ {}^{k}\textbf{X}_p = {}^{k}H \textbf{X}_p, \quad \text{where} \quad {}^{k}H = \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \]
Projecting points from the camera to the sensor.
Recall for our pinhole model:
\[ \begin{aligned} {}^{c}x_p &= c \frac{{}^{k}X_p}{{}^{k}Z_p} \\ {}^{c}y_p &= c \frac{{}^{k}Y_p}{{}^{k}Z_p} \end{aligned} \]
where \(c\) is the focal length, or camera constant.
\[ \begin{bmatrix} U \\ V \\ W \\ T \end{bmatrix} = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & c & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} {}^{k}X_p \\ {}^{k}Y_p \\ {}^{k}Z_p \\ 1 \end{bmatrix} \]
Drop the 3rd row:
\[ \begin{bmatrix} {}^{c}x_p \\ {}^{c}y_p \\ 1 \end{bmatrix} = \begin{bmatrix} {}^{c}u_p \\ {}^{c}v_p \\ {}^{c}w_p \end{bmatrix} = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} {}^{k}X_p \\ {}^{k}Y_p \\ {}^{k}Z_p \\ 1 \end{bmatrix} \]
The mapping for an ideal camera is:
\[ {}^{c}x = {}^{c}P X \]
with:
\[ {}^{c}P = \begin{bmatrix} c & 0 & 0 & 0 \\ 0 & c & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} R & -R \textbf{X}_o \\ \textbf{0}^T & 1 \end{bmatrix} \]
We can now define the calibration matrix for an ideal camera.
\[ {}^{c}K = \begin{bmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{bmatrix} \]
The mapping of a point in the world to the image plane is:
\[ {}^{c}P = {}^{c}K R [I_3 | - \textbf{X}_o] \]
The next step is mapping from the image plane to the sensor.
Origin of sensor space is not at the principal point:
\[ {}^{s}H_{c} = \begin{bmatrix} 1 & 0 & x_H \\ 0 & 1 & y_H \\ 0 & 0 & 1 \end{bmatrix} \]
Compensation is a translation.
We need to add 4 additional parameters to our calibration matrix:
\[ {}^{s}H_{c} = \begin{bmatrix} 1 & s & x_H \\ 0 & 1 + m & y_H \\ 0 & 0 & 1 \end{bmatrix} \]
Normally, we combine these compensations with the ideal calibration matrix:
\[ \begin{aligned} K &= \begin{bmatrix} 1 & s & x_H \\ 0 & 1 + m & y_H \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} c & 0 & 0 \\ 0 & c & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} c & s & x_H \\ 0 & c(1 + m) & y_H \\ 0 & 0 & 1 \end{bmatrix} \end{aligned} \]
\[ K = \begin{bmatrix} c & s & x_H \\ 0 & c(1 + m) & y_H \\ 0 & 0 & 1 \end{bmatrix} \]
There are 5 intrinsic parameters:
Finally, we have the \(3 \times 4\) homogeneous projection matrix:
\[ P = K R [I_3 | - \textbf{X}_o] \]
It contains 11 parameters:
We have control points of known coordinates in the world.
We want to estimate the camera parameters, given these points.
Direct Linear Transformation (DLT) maps a point in the world to a point in the image.
\[ \begin{aligned} x &= K R [I_3 | - \textbf{X}_o] \textbf{X} \\ &= P \textbf{X} \end{aligned} \]
\[ x = K R [I_3 | - \textbf{X}_o] \textbf{X} = P \textbf{X} \]
Compute the 11 intrinsic and extrinsic parameters.
Homogeneous projection:
\[ \begin{bmatrix} u \\ v \\ w \end{bmatrix} = P \begin{bmatrix} U \\ V \\ W \\ T \end{bmatrix} \]
Normalised homogeneous projection:
\[ \begin{bmatrix} u/w \\ v/w \\ 1 \end{bmatrix} = P \begin{bmatrix} U/T \\ V/T \\ W/T \\ 1 \end{bmatrix} \]
Euclidean coordinates:
\[ \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = P \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \]
We can expand the multiplication by \(P\) to get the following:
\[ \begin{aligned} x &= \frac{p_{11}X + p_{12}Y + p_{13}Z + p_{14}} {p_{31}X + p_{32}Y + p_{33}Z + p_{34}} \\[10pt] y &= \frac{p_{21}X + p_{22}Y + p_{23}Z + p_{24}} {p_{31}X + p_{32}Y + p_{33}Z + p_{34}} \end{aligned} \]
Each point gives two observation equations, one for each image coordinate.
Each point gives two observation equations, one for each image coordinate.
We need at least 6 points to estimate 11 parameters.
\[ \textbf{x}_i = P \textbf{X}_i \]
\[ \textbf{x}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]
\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]
Define three vectors:
\[ A = \begin{bmatrix} p_{11} \\ p_{12} \\ p_{13} \\ p_{14} \end{bmatrix}, \quad B = \begin{bmatrix} p_{21} \\ p_{22} \\ p_{23} \\ p_{24} \end{bmatrix}, \quad C = \begin{bmatrix} p_{31} \\ p_{32} \\ p_{33} \\ p_{34} \end{bmatrix} \]
\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]
Rewrite the equation as:
\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} A^T \\ B^T \\ C^T \end{bmatrix} \textbf{X}_i \]
\[ \textbf{x}_i = P \textbf{X}_i = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{bmatrix} \textbf{X}_i \]
Rewrite the equation as:
\[ \begin{bmatrix} u_i \\ v_i \\ w_i \end{bmatrix} \quad = \quad \textbf{x}_i \quad = \quad P \textbf{X}_i \quad = \quad \begin{bmatrix} A^T \\ B^T \\ C^T \end{bmatrix} \textbf{X}_i \quad = \quad \begin{bmatrix} A^T X_i \\ B^T X_i \\ C^T X_i \end{bmatrix} \]
\[ \textbf{x}_i = \begin{bmatrix} x_i \\ y_i \\ 1 \end{bmatrix}, \quad \begin{bmatrix} u_i \\ v_i \\ w_i \end{bmatrix} = \begin{bmatrix} A^T X_i \\ B^T X_i \\ C^T X_i \end{bmatrix} \]
\[ x_i = \frac{u_i}{w_i} = \frac{A^T X_i}{C^T X_i}, \quad y_i = \frac{v_i}{w_i} = \frac{B^T X_i}{C^T X_i} \]
\[ x_i = \frac{A^T X_i}{C^T X_i} \quad \Rightarrow \quad x_i C^T X_i - A^T X_i = 0 \]
\[ y_i = \frac{B^T X_i}{C^T X_i} \quad \Rightarrow \quad y_i C^T X_i - B^T X_i = 0 \]
Leading to a system of linear equations in \(A\), \(B\), and \(C\):
\[ \begin{aligned} - X_{i}^{T} A + x_i X_{i}^{T} C &= 0 \\ - X_{i}^{T} B + y_i X_{i}^{T} C &= 0 \end{aligned} \]
let:
\[ \textbf{p} = \begin{bmatrix} A \\ B \\ C \end{bmatrix} = vec(P^T) = \begin{bmatrix} p_{11} \\ p_{12} \\ p_{13} \\ p_{14} \\ p_{21} \\ p_{22} \\ p_{23} \\ p_{24} \\ p_{31} \\ p_{32} \\ p_{33} \\ p_{34} \end{bmatrix} \]
\[ \begin{aligned} - X_{i}^{T} A &\quad &+x_i X_{i}^{T} C &= 0 \\ \quad &- X_{i}^{T} B &+y_i X_{i}^{T} C &= 0 \end{aligned} \]
rewrite as:
\[ a_{x_i}^T \textbf{p} = 0, \quad a_{y_i}^T \textbf{p} = 0 \]
with:
\[ \begin{aligned} \textbf{p} &= vec(P^T) \\ a_{x_i}^T &= (-X_i, -Y_i, -Z_i, -1, 0, 0, 0, 0, x_i X_i, x_i Y_i, x_i Z_i, x_i) \\ a_{y_i}^T &= (0, 0, 0, 0, -X_i, -Y_i, -Z_i, -1, y_i X_i, y_i Y_i, y_i Z_i, y_i) \end{aligned} \]
for each point we have:
\[ a_{x_i}^T \textbf{p} = 0, \quad a_{y_i}^T \textbf{p} = 0 \]
stacking all the points vertically:
\[ \begin{bmatrix} a_{x_1}^T \\ a_{y_1}^T \\ a_{x_2}^T \\ a_{y_2}^T \\ \dots \\ a_{x_n}^T \\ a_{y_n}^T \\ \end{bmatrix} \textbf{p} = M \textbf{p} \overset{!}{=} 0 \]
Where \(M\) is a \(2n \times 12\) matrix.
Solving a system of linear equations of the form \(Ax = 0\) is equivalent to finding the null space of \(A\).
Does it always work?
No solution if all points \(X_i\) are on a plane.
From \(P\) to \(K\), \(R\), \(\textbf{X}_o\)
We have \(P\), how do we obtain \(K, R, \textbf{X}_o\)?
Structure of \(P\):
\[ P = [K R | -K R\textbf{X}_o] = [H | \textbf{h}] \]
with:
\[ H = K R, \quad \textbf{h} = -KR\textbf{X}_o \]
\[ H = K R, \quad \textbf{h} = -KR\textbf{X}_o \]
We can obtain the projection centre by:
\[ \textbf{X}_o = -H^{-1} \textbf{h} \]
\[ H = K R \]
What do we know about these matrices?
Exploit the structure of \(H=K R\)
There is a standard method to decompose a matrix to a rotation and triangular matrix.
We perform a QR decomposition on \(H^{-1}\), given the order of rotation and triangular matrices.
\[ H^{-1} = (K R)^{-1} = R^{-1}K^{-1} = R^{T}K^{-1} \]
The Matrix \(H = K R\) is homogeneous, therefore so is \(K\), so we must normalise.
\[ K \leftarrow \frac{1}{K_{33}} K \]
reading: