Hartley & Zisserman, Chapter 6

Camera Models

The pinhole camera as a 3×4 matrix. Calibration matrix K, decomposition P = K[R|t], finite cameras, projective cameras, and cameras at infinity.

Prerequisites: Chapter 3 (Projective 3D) + Chapter 4 (Estimation).

Chapters

Simulations

Chapter 0: Why Camera Models?

So far we have studied the geometry of 2D and 3D projective spaces in isolation. Now we connect them: a camera is the map from 3D to 2D. It takes a 3D point and produces a 2D image point. In homogeneous coordinates, this is a 3×4 matrix P:

x = P X (3-vector = 3×4 matrix · 4-vector)

This single equation is the heart of the book. Everything else — epipolar geometry, reconstruction, calibration — follows from it. But where does P come from? What are its parts? What does each component mean physically?

The camera matrix P encodes everything: Where the camera is (translation t), which way it's pointing (rotation R), and its internal optics (focal length, principal point, pixel shape — all packed into K). One matrix, 11 degrees of freedom.

Camera Projection

A 3D cube is projected through the camera centre onto the image plane. Adjust the focal length to change perspective.

Focal length80

What size is the camera matrix P?

3×4 (maps 4D homogeneous 3D points to 3D homogeneous image points) 3×3 (same as a homography) 4×4 (a full projective transformation)

Chapter 1: The Pinhole Camera

The most basic camera model: every ray of light passes through a single point (the camera centre C) and strikes a flat image plane. A 3D point (X, Y, Z) maps to image coordinates (fX/Z, fY/Z).

In homogeneous coordinates, this is linear: the camera centre is at the origin, the image plane is at z = f, and the projection is:

[fX, fY, Z]^T = [f 0 0 0 ; 0 f 0 0 ; 0 0 1 0] · [X, Y, Z, 1]^T

Key simplification: In homogeneous coordinates, the division by Z is implicit. The projection x = PX is a matrix multiplication — purely linear. All the nonlinearity is hidden in the homogeneous representation.

The simplest camera matrix is P = diag(f, f, 1)[I | 0], which places the camera at the origin looking along the Z-axis. Real cameras need rotation, translation, and internal parameter adjustments.

In a pinhole camera, all rays of light pass through what?

A single point: the camera centre A lens aperture The image plane

Chapter 2: The Calibration Matrix K

The calibration matrix K is an upper-triangular 3×3 matrix encoding the camera's internal parameters (also called intrinsics):

K = [α_x s x₀ ; 0 α_y y₀ ; 0 0 1]

Parameter	Symbol	Meaning
Focal length (x)	α_x = f · m_x	Focal length in x-pixels
Focal length (y)	α_y = f · m_y	Focal length in y-pixels
Skew	s	Non-orthogonality of pixel axes (usually 0)
Principal point	(x₀, y₀)	Where optical axis hits the image

5 intrinsic parameters: K has 5 DOF (it's upper-triangular 3×3 with 6 entries, but the bottom-right is fixed to 1). For modern cameras, s ≈ 0 and α_x ≈ α_y (square pixels), so often only 3 parameters matter: f, x₀, y₀.

How many independent internal parameters does a general pinhole camera have?

5 (α_x, α_y, s, x₀, y₀) 3 11

Chapter 3: Decomposition P = K[R|t]

The full camera matrix for a finite projective camera decomposes as:

P = K [R | t] (3×4 = 3×3 · 3×4)

where K is the calibration matrix (internal), R is a 3×3 rotation matrix (camera orientation), and t is a 3-vector (camera translation). The camera centre C in world coordinates satisfies PC = 0, giving C = −R^Tt.

Extrinsic parameters (6 DOF): R has 3 DOF (rotation angles) and t has 3 DOF (position). Combined with 5 intrinsic DOF from K, the total is 11 DOF for P. This matches: a 3×4 matrix has 12 entries minus 1 for scale = 11.

Given a camera matrix P, we can recover K and R via RQ decomposition of the left 3×3 submatrix M = KR. The camera centre is the null space of P.

What is the total number of DOF in a camera matrix P = K[R|t]?

11 (5 intrinsic + 3 rotation + 3 translation) 12 8

Chapter 4: Camera Anatomy

Several important geometric entities are directly readable from P:

Entity	Computation	Geometric Meaning
Camera centre C	PC = 0 (null space)	Point in 3D where all rays converge
Principal axis	Third row of M	Direction the camera is pointing
Principal point	p = Pp_∞	Where the optical axis hits the image
Image of X at ∞	M · d	How a direction d projects to the image

The action of P on points: For a point X = (X, 1)^T, the image is x = MX + p₄, where M is the left 3×3 of P and p₄ is the last column. The depth of X relative to the camera is encoded in the third coordinate of x before de-homogenization.

How do you find the camera centre from the camera matrix P?

Compute the null space of P: the point C such that PC = 0 Take the last column of P Invert P

Chapter 5: Projective Cameras

A general projective camera is any 3×4 matrix P of rank 3. Not every such matrix corresponds to a physically realizable camera. But mathematically, it is the most general model: any rank-3 matrix P defines a valid projection from IP³ to IP².

Two camera matrices P and PH (for any invertible 4×4 H) take the same images of corresponding scene points X and HX. This is the projective ambiguity: from images alone, we can only recover cameras and structure up to a common projective transformation.

Projective ambiguity: If (P, P', {X_i}) is a valid camera-structure configuration, so is (PH⁻¹, P'H⁻¹, {HX_i}) for any 4×4 H. This is the fundamental ambiguity of uncalibrated reconstruction.

What is the rank of a valid camera matrix P?

3 4 2

Chapter 6: Depth of Points

When we compute x = PX, the third component of x (before de-homogenization) is related to the depth of X from the camera. Specifically, for P = [M | p₄]:

depth(X; P) = sign(det M) · w / (||m³|| · T)

where w is the third component of PX, m³ is the third row of M, and T is the last component of X. Positive depth means the point is in front of the camera.

Chirality: In reconstruction, we want all scene points to have positive depth in all cameras. This is the chirality constraint. When extracting cameras from the essential matrix, exactly one of four possible solutions places all points in front of both cameras.

What does "positive depth" mean for a 3D point relative to a camera?

The point is in front of the camera The point is far away The point has a large Z coordinate

Chapter 7: Cameras at Infinity

When the camera is infinitely far from the scene (or equivalently, the scene depth variation is negligible compared to the distance), projection becomes parallel rather than perspective. The camera centre is a point at infinity.

The camera matrix for a camera at infinity has the form P_∞ = [M | t] where M has rank 2 and the last row of P is (0, 0, 0, 1). There is no perspective foreshortening.

When is this useful? Satellite imagery, telephoto photography, microscopy — any situation where perspective effects are negligible. Affine cameras are easier to work with: reconstruction becomes linear (no nonlinear optimization needed).

What type of projection does a camera at infinity produce?

Parallel (affine) projection — no perspective foreshortening Fisheye projection Spherical projection

Chapter 8: Affine Camera Hierarchy

Cameras at infinity form their own hierarchy, paralleling the 3D transformation hierarchy:

Camera Type	Form	Properties
General affine	P = [M_2×3 t; 0 0 0 1]	Parallel projection, general direction
Weak perspective	P with M = sR_2×3	Scaled orthographic projection
Orthographic	P with s = 1	True parallel projection, no scaling

Weak perspective: The most commonly used affine camera model. It first projects orthographically, then scales by the average depth. Accurate when depth variation is small relative to average depth. Used extensively in face modeling, object recognition, and structure-from-motion initialization.

When is a weak perspective camera model appropriate?

When depth variation in the scene is small relative to the average distance to the camera When the focal length is very small (wide-angle lens) When the image has high resolution

Chapter 9: Connections

The camera matrix P is the central object of the book. Everything that follows is about estimating P, exploiting relationships between multiple P's, and recovering 3D structure from them.

Next Chapter	Uses P for
Ch 7: Camera Computation	Estimating P from 3D↔2D correspondences
Ch 8: Single View Geometry	What P tells us about a single image
Ch 9: Epipolar Geometry	Relationship between two P's → fundamental matrix F
Ch 19: Auto-Calibration	Recovering K from multiple P's

"The camera matrix P encapsulates the mapping from 3-space to an image. It is a model of the central projection imaging process."

— Hartley & Zisserman, Chapter 6

What is the projective ambiguity of camera matrices?

P and PH⁻¹ give the same images for transformed points HX — cameras and structure can only be recovered up to a 4×4 projective transformation Camera matrices are unique The ambiguity is only in scale

← Chapter 5 Chapter 7: Camera Computation →