The pinhole camera as a 3×4 matrix. Calibration matrix K, decomposition P = K[R|t], finite cameras, projective cameras, and cameras at infinity.
So far we have studied the geometry of 2D and 3D projective spaces in isolation. Now we connect them: a camera is the map from 3D to 2D. It takes a 3D point and produces a 2D image point. In homogeneous coordinates, this is a 3×4 matrix P:
This single equation is the heart of the book. Everything else — epipolar geometry, reconstruction, calibration — follows from it. But where does P come from? What are its parts? What does each component mean physically?
A 3D cube is projected through the camera centre onto the image plane. Adjust the focal length to change perspective.
The most basic camera model: every ray of light passes through a single point (the camera centre C) and strikes a flat image plane. A 3D point (X, Y, Z) maps to image coordinates (fX/Z, fY/Z).
In homogeneous coordinates, this is linear: the camera centre is at the origin, the image plane is at z = f, and the projection is:
The simplest camera matrix is P = diag(f, f, 1)[I | 0], which places the camera at the origin looking along the Z-axis. Real cameras need rotation, translation, and internal parameter adjustments.
The calibration matrix K is an upper-triangular 3×3 matrix encoding the camera's internal parameters (also called intrinsics):
| Parameter | Symbol | Meaning |
|---|---|---|
| Focal length (x) | αx = f · mx | Focal length in x-pixels |
| Focal length (y) | αy = f · my | Focal length in y-pixels |
| Skew | s | Non-orthogonality of pixel axes (usually 0) |
| Principal point | (x0, y0) | Where optical axis hits the image |
The full camera matrix for a finite projective camera decomposes as:
where K is the calibration matrix (internal), R is a 3×3 rotation matrix (camera orientation), and t is a 3-vector (camera translation). The camera centre C in world coordinates satisfies PC = 0, giving C = −RTt.
Given a camera matrix P, we can recover K and R via RQ decomposition of the left 3×3 submatrix M = KR. The camera centre is the null space of P.
Several important geometric entities are directly readable from P:
| Entity | Computation | Geometric Meaning |
|---|---|---|
| Camera centre C | PC = 0 (null space) | Point in 3D where all rays converge |
| Principal axis | Third row of M | Direction the camera is pointing |
| Principal point | p = Pp∞ | Where the optical axis hits the image |
| Image of X at ∞ | M · d | How a direction d projects to the image |
A general projective camera is any 3×4 matrix P of rank 3. Not every such matrix corresponds to a physically realizable camera. But mathematically, it is the most general model: any rank-3 matrix P defines a valid projection from IP3 to IP2.
Two camera matrices P and PH (for any invertible 4×4 H) take the same images of corresponding scene points X and HX. This is the projective ambiguity: from images alone, we can only recover cameras and structure up to a common projective transformation.
When we compute x = PX, the third component of x (before de-homogenization) is related to the depth of X from the camera. Specifically, for P = [M | p4]:
where w is the third component of PX, m3 is the third row of M, and T is the last component of X. Positive depth means the point is in front of the camera.
When the camera is infinitely far from the scene (or equivalently, the scene depth variation is negligible compared to the distance), projection becomes parallel rather than perspective. The camera centre is a point at infinity.
The camera matrix for a camera at infinity has the form P∞ = [M | t] where M has rank 2 and the last row of P is (0, 0, 0, 1). There is no perspective foreshortening.
Cameras at infinity form their own hierarchy, paralleling the 3D transformation hierarchy:
| Camera Type | Form | Properties |
|---|---|---|
| General affine | P = [M2×3 t; 0 0 0 1] | Parallel projection, general direction |
| Weak perspective | P with M = sR2×3 | Scaled orthographic projection |
| Orthographic | P with s = 1 | True parallel projection, no scaling |
The camera matrix P is the central object of the book. Everything that follows is about estimating P, exploiting relationships between multiple P's, and recovering 3D structure from them.
| Next Chapter | Uses P for |
|---|---|
| Ch 7: Camera Computation | Estimating P from 3D↔2D correspondences |
| Ch 8: Single View Geometry | What P tells us about a single image |
| Ch 9: Epipolar Geometry | Relationship between two P's → fundamental matrix F |
| Ch 19: Auto-Calibration | Recovering K from multiple P's |