Hartley & Zisserman, Chapter 6

Camera Models

The pinhole camera as a 3×4 matrix. Calibration matrix K, decomposition P = K[R|t], finite cameras, projective cameras, and cameras at infinity.

Prerequisites: Chapter 3 (Projective 3D) + Chapter 4 (Estimation).
10
Chapters
5+
Simulations

Chapter 0: Why Camera Models?

So far we have studied the geometry of 2D and 3D projective spaces in isolation. Now we connect them: a camera is the map from 3D to 2D. It takes a 3D point and produces a 2D image point. In homogeneous coordinates, this is a 3×4 matrix P:

x = P X   (3-vector = 3×4 matrix · 4-vector)

This single equation is the heart of the book. Everything else — epipolar geometry, reconstruction, calibration — follows from it. But where does P come from? What are its parts? What does each component mean physically?

The camera matrix P encodes everything: Where the camera is (translation t), which way it's pointing (rotation R), and its internal optics (focal length, principal point, pixel shape — all packed into K). One matrix, 11 degrees of freedom.
Camera Projection

A 3D cube is projected through the camera centre onto the image plane. Adjust the focal length to change perspective.

Focal length80
What size is the camera matrix P?

Chapter 1: The Pinhole Camera

The most basic camera model: every ray of light passes through a single point (the camera centre C) and strikes a flat image plane. A 3D point (X, Y, Z) maps to image coordinates (fX/Z, fY/Z).

In homogeneous coordinates, this is linear: the camera centre is at the origin, the image plane is at z = f, and the projection is:

[fX, fY, Z]T = [f 0 0 0 ; 0 f 0 0 ; 0 0 1 0] · [X, Y, Z, 1]T
Key simplification: In homogeneous coordinates, the division by Z is implicit. The projection x = PX is a matrix multiplication — purely linear. All the nonlinearity is hidden in the homogeneous representation.

The simplest camera matrix is P = diag(f, f, 1)[I | 0], which places the camera at the origin looking along the Z-axis. Real cameras need rotation, translation, and internal parameter adjustments.

In a pinhole camera, all rays of light pass through what?

Chapter 2: The Calibration Matrix K

The calibration matrix K is an upper-triangular 3×3 matrix encoding the camera's internal parameters (also called intrinsics):

K = [αx   s   x0 ; 0   αy   y0 ; 0   0   1]
ParameterSymbolMeaning
Focal length (x)αx = f · mxFocal length in x-pixels
Focal length (y)αy = f · myFocal length in y-pixels
SkewsNon-orthogonality of pixel axes (usually 0)
Principal point(x0, y0)Where optical axis hits the image
5 intrinsic parameters: K has 5 DOF (it's upper-triangular 3×3 with 6 entries, but the bottom-right is fixed to 1). For modern cameras, s ≈ 0 and αx ≈ αy (square pixels), so often only 3 parameters matter: f, x0, y0.
How many independent internal parameters does a general pinhole camera have?

Chapter 3: Decomposition P = K[R|t]

The full camera matrix for a finite projective camera decomposes as:

P = K [R | t]   (3×4 = 3×3 · 3×4)

where K is the calibration matrix (internal), R is a 3×3 rotation matrix (camera orientation), and t is a 3-vector (camera translation). The camera centre C in world coordinates satisfies PC = 0, giving C = −RTt.

Extrinsic parameters (6 DOF): R has 3 DOF (rotation angles) and t has 3 DOF (position). Combined with 5 intrinsic DOF from K, the total is 11 DOF for P. This matches: a 3×4 matrix has 12 entries minus 1 for scale = 11.

Given a camera matrix P, we can recover K and R via RQ decomposition of the left 3×3 submatrix M = KR. The camera centre is the null space of P.

What is the total number of DOF in a camera matrix P = K[R|t]?

Chapter 4: Camera Anatomy

Several important geometric entities are directly readable from P:

EntityComputationGeometric Meaning
Camera centre CPC = 0 (null space)Point in 3D where all rays converge
Principal axisThird row of MDirection the camera is pointing
Principal pointp = PpWhere the optical axis hits the image
Image of X at ∞M · dHow a direction d projects to the image
The action of P on points: For a point X = (X, 1)T, the image is x = MX + p4, where M is the left 3×3 of P and p4 is the last column. The depth of X relative to the camera is encoded in the third coordinate of x before de-homogenization.
How do you find the camera centre from the camera matrix P?

Chapter 5: Projective Cameras

A general projective camera is any 3×4 matrix P of rank 3. Not every such matrix corresponds to a physically realizable camera. But mathematically, it is the most general model: any rank-3 matrix P defines a valid projection from IP3 to IP2.

Two camera matrices P and PH (for any invertible 4×4 H) take the same images of corresponding scene points X and HX. This is the projective ambiguity: from images alone, we can only recover cameras and structure up to a common projective transformation.

Projective ambiguity: If (P, P', {Xi}) is a valid camera-structure configuration, so is (PH−1, P'H−1, {HXi}) for any 4×4 H. This is the fundamental ambiguity of uncalibrated reconstruction.
What is the rank of a valid camera matrix P?

Chapter 6: Depth of Points

When we compute x = PX, the third component of x (before de-homogenization) is related to the depth of X from the camera. Specifically, for P = [M | p4]:

depth(X; P) = sign(det M) · w / (||m3|| · T)

where w is the third component of PX, m3 is the third row of M, and T is the last component of X. Positive depth means the point is in front of the camera.

Chirality: In reconstruction, we want all scene points to have positive depth in all cameras. This is the chirality constraint. When extracting cameras from the essential matrix, exactly one of four possible solutions places all points in front of both cameras.
What does "positive depth" mean for a 3D point relative to a camera?

Chapter 7: Cameras at Infinity

When the camera is infinitely far from the scene (or equivalently, the scene depth variation is negligible compared to the distance), projection becomes parallel rather than perspective. The camera centre is a point at infinity.

The camera matrix for a camera at infinity has the form P = [M | t] where M has rank 2 and the last row of P is (0, 0, 0, 1). There is no perspective foreshortening.

When is this useful? Satellite imagery, telephoto photography, microscopy — any situation where perspective effects are negligible. Affine cameras are easier to work with: reconstruction becomes linear (no nonlinear optimization needed).
What type of projection does a camera at infinity produce?

Chapter 8: Affine Camera Hierarchy

Cameras at infinity form their own hierarchy, paralleling the 3D transformation hierarchy:

Camera TypeFormProperties
General affineP = [M2×3 t; 0 0 0 1]Parallel projection, general direction
Weak perspectiveP with M = sR2×3Scaled orthographic projection
OrthographicP with s = 1True parallel projection, no scaling
Weak perspective: The most commonly used affine camera model. It first projects orthographically, then scales by the average depth. Accurate when depth variation is small relative to average depth. Used extensively in face modeling, object recognition, and structure-from-motion initialization.
When is a weak perspective camera model appropriate?

Chapter 9: Connections

The camera matrix P is the central object of the book. Everything that follows is about estimating P, exploiting relationships between multiple P's, and recovering 3D structure from them.

Next ChapterUses P for
Ch 7: Camera ComputationEstimating P from 3D↔2D correspondences
Ch 8: Single View GeometryWhat P tells us about a single image
Ch 9: Epipolar GeometryRelationship between two P's → fundamental matrix F
Ch 19: Auto-CalibrationRecovering K from multiple P's
"The camera matrix P encapsulates the mapping from 3-space to an image. It is a model of the central projection imaging process."
— Hartley & Zisserman, Chapter 6
What is the projective ambiguity of camera matrices?
← Chapter 5 Chapter 7: Camera Computation →