Mathematical background and projective geometry

The concepts presented in this document are mostly based on the theory developed in Criminisi's thesis and other publications and proceed from a strong and reliable mathematical basis, Projective Geometry.

In particular, 2D-2D homographic transformations and more general 3D-2D projectivities are used. The algorithms presented in the later sections require no knowledge of the camera's internal parameters (focal length, aspect ratio, principal point) or external ones (position and orientation). Camera calibration is replaced by the use of scene constraints such as planarity of points and parallelism of lines and planes.

In this section we describe the basic equations and mathematical structures for making measurements on the image. They are used in the algorithms presented in section 4.

Camera model

The camera model we used is known as pinhole camera (central projection). Every point in 3D space is projected onto the image plane by a straight visual ray that connects it with the camera center, also the center of projection (figure 2).

Figure 2: Projection of points in 3D space onto the image plane, through central projection
\includegraphics[width=8cm height=6cm]{images/projectionSample.eps}

The projection can be formulated mathematically with the help of a $3\times4$ matrix, called projection matrix and is denoted as $\Pi$. Given the projection matrix , we can map a world point 3 $\mathbf{X} = [ \begin{array}{cccc} x & y & z & w \end{array} ] $ to an image point $\mathbf{x} = [ \begin{array}{ccc} x' & y' & w' \end{array} ] $ using the equation (1).

\begin{displaymath}
\mathbf{x}^T = \Pi \mathbf{X}
\end{displaymath} (1)

The camera model is defined if we know the matrix $\Pi$.

Plane to plane mapping

In single view reconstruction we use a specialization of the image to world mapping described above. Given a known plane on the world, we can map each point $\mathbf{X} = [ \begin{array}{ccc} x & y & w\end{array} ] $ on this plane to the corresponding point $\mathbf{x} = [ \begin{array}{ccc} x' & y' & w' \end{array} ] $ on the image. Note that the $(x,y)$ coordinates of the world point $\mathbf{X}$ are coordinates relative to the mapped plane and not to the wold coordinate system. This mapping is called Planar homography and is done using a $3\times3$ matrix called Homography matrix, denoted as $\mathrm{H}$


\begin{displaymath}
\mathbf{x}^T = \mathrm{H}\mathbf{X}
\end{displaymath} (2)

This mapping is useful, in our case , because we have a plane on the world with known attributes. This is the ground plane. Suppose that the coordinate system is set in a way that the vertical direction is the $\mathrm{z}$ axis. In a three-dimensional scene the ground plane can be defined as the plane, whose points have the same $\mathrm{z}$ coordinate4.

The Homography matrix can be computed from the relative positioning of the two planes and camera center, and the camera internal parameters [5, p. 33]. It can also be computed by at-least four image-to-world point correspondences [5, p. 48]. Since in most cases of SVR we don't know the camera parameters, and the acquisition of image-to-world mappings is not always possible (i.e. when trying to reconstruct a painting), the above techniques are not easily applicable. Fortunately the $\mathrm{H}$ matrix can be computed using another method.

In 2.2 the concepts of vanishing points and vanishing lines where introduced. These geometric cues, convey a lot of information about the direction of lines and the orientation of planes. Given the vanishing line of the ground plane for a scene as well as the vertical vanishing point, can obtain an up-to-scale version of the homography matrix [8].


\begin{displaymath}
\mathrm{H} = [\begin{array}{ccc} \mathrm{a}\mathbf{v}_x & \mathrm{b}\mathbf{v}_y & \mathbf{l}\end{array}]
\end{displaymath} (3)

$\mathbf{l}$
is the vanishing line normalized ( $\mathbf{l} = \frac{\mathbf{Vl}}{\Vert\mathbf{Vl}\Vert} $).
$\mathrm{a}$,$\mathrm{b}$
are scale factors that can be computed from know lengths on the image.

Once the $\mathrm{H}$ matrix is known, we can map any ground point on the image back to the real world, by computing the $H^{-1}$ matrix and applying the equation 4

\begin{displaymath}
\mathbf{X}^T = \mathrm{H^{-1}}\mathbf{x}
\end{displaymath} (4)


Projective geometry invariants

In the later sections we utilize another characteristic found in perspective images, The invariance. The formulation of invariants is one of the most significant contributions of projective geometry [14, p. 485]. A wide variety of invariants is available for sets of points, lines as well as curves.

In this work we make use of the cross-ratio invariant for lines. Given four points on a line, their cross-ratio is preserved under projective transformation thought the ratio of distances is not preserver. The cross-ratio is defined by

\begin{displaymath}
\mathbf{C}(P_1,P_2,P_3,P_4) = \frac{\displaystyle (X^3 - X^1)(X^4 - X^2)}
{\displaystyle (X^3 - X^2)(X^4 - X^1)}
\end{displaymath} (5)

where $ \{ X^1,X^2,X^3,X^4 \} $ represent the corresponding positions of each point along the line, i.e. $ ( X^3 - X^1 ) $ is the distance between point $P_3$ and $P_1$ .


Planar measurements

On an image with sufficient perspective information, we can compute the three dominant vanishing points. Given this we automatically know an up-to-scale version of the Homography matrix (eq. 3). These are enough to make various measurements on the image.

Height

If we know one point on the ground $\mathbf{x}$ and one point $\mathbf{x'}$ on the line denoted by the segment $\mathbf{xv_z}$, we can measure the distance between $\mathbf{xx'}$ using the cross-ratio invariance described in 3.3.
\begin{displaymath}
\frac{\mathrm{Z}}{\mathrm{Z_c}} = 1 - \frac{d(\mathbf{x'},...
..._z})}
{d(\mathbf{x},\mathbf{c})d(\mathbf{x'},\mathbf{v_z})}
\end{displaymath} (6)

where $\mathbf{c}$ is the intersection of $\mathbf{xv_z}$ and $\mathbf{vl}$. $\mathrm{Z_c}$ is the distance from the camera. This can be computed if we know a reference length.

Length

The distance between two objects in the scene, can be defined as the distance of two points on the ground plane. If we want to calculate the distance of $\mathbf{p_1}$ and $\mathbf{p_2}$ we must first map them back to their real world coordinates on the ground.


$\displaystyle \mathbf{P_1} = \mathbf{H^{-1}}\mathbf{p_1}$      
$\displaystyle \mathbf{P_2} = \mathbf{H^{-1}}\mathbf{p_2}$     (7)

The two points $\mathbf{P_1}$ and $\mathbf{P_2}$ correspond to points in the real scene and are of the form $[\begin{array}{lll} X & Y & 0 \end{array}]$

The length of the segment $\mathbf{P_1P_2}$ is


\begin{displaymath}
d(\mathbf{P_1},\mathbf{P_2}) = \sqrt{(X_2-X_1)^2 + (Y_2-Y_1)^2 + (0-0)^2}
\end{displaymath} (8)

Homography matrix of a plane parallel to the ground

If we need to compare points that belong on a plane parallel to the ground plane, we must first calculate the correct Homography matrix. If we know the distance $\mathbf{Z_p}$ between the two planes, the $\mathrm{H}$ given by eq. 9


\begin{displaymath}
\mathrm{H'} = [\begin{array}{ccc} \mathrm{a}\mathbf{v}_x & \mathrm{b}\mathbf{v}_y & \mathbf{Z_pv_z}+\mathbf{l}\end{array}]
\end{displaymath} (9)



Footnotes

... point3
the points are in homogeneous coordinates
... coordinate4
The value of z on the ground plane is usually considered to be $0$.