Hartley & Zisserman, Chapter 10

3D Reconstruction of Cameras & Structure

How to recover 3D scene geometry from two uncalibrated views. Projective reconstruction, the stratified upgrade to affine and metric, and the fundamental theorem.

Prerequisites: Chapter 9 (Epipolar Geometry).

Chapters

Simulations

Chapter 0: Why Reconstruction?

Given two photographs of a scene, can we recover the 3D geometry — the shape of buildings, the positions of objects, the layout of a room? Yes, but with an important caveat: how much of the 3D structure we can recover depends on what we know about the cameras.

The hierarchy of reconstruction:
• Projective (from F alone): Shape is recovered up to a projective transformation. Straight lines stay straight, but angles and distances are distorted.
• Affine (+ knowing the plane at infinity): Parallel lines are parallel. Midpoints and ratios of lengths along lines are correct.
• Metric/Euclidean (+ knowing the absolute conic): Angles and distance ratios are correct. The true 3D shape, up to a global scale and position.

This chapter shows how to achieve each level, and what information is needed to upgrade from one to the next.

What is the best level of reconstruction achievable from two uncalibrated views with no scene knowledge?

Projective reconstruction (up to a 4×4 projective transformation) Metric reconstruction No reconstruction is possible

Chapter 1: The Reconstruction Method

The basic approach to two-view reconstruction has three steps:

Step	Action	Result
1	Compute F from point correspondences x_i ↔ x'_i	The fundamental matrix
2	Compute camera matrices P, P' from F	Camera pair (up to projective ambiguity)
3	For each correspondence, triangulate to find X_i	3D point cloud

Triangulation: Given P, P' and a correspondence x ↔ x' satisfying x'^TFx = 0, the rays back-projected from x and x' are coplanar and intersect in a 3D point X. This is triangulation — the subject of Chapter 12.

Points on the baseline (the line joining the two camera centres) cannot be triangulated, as both back-projected rays are the baseline itself. These points project to the epipoles in both images.

Which 3D points cannot be uniquely triangulated from two views?

Points on the baseline between the two camera centres Points at infinity Points behind the cameras

Chapter 2: Reconstruction Ambiguity

Even with perfect data, reconstruction from images alone is inherently ambiguous. The level of ambiguity depends on what is known about the cameras.

Camera knowledge	Ambiguity	DOF
Nothing (uncalibrated)	Projective (4×4 matrix H)	15
Calibrated (K known)	Similarity (rotation + translation + scale)	7
Calibrated + known scale	Euclidean (rotation + translation)	6

Why is scale ambiguous? Consider a corridor photo. Is the corridor 2 metres wide or 10 centimetres (a doll's house)? Both produce identical images if the camera is scaled accordingly. Overall scale cannot be determined from images alone — you need at least one known distance in the scene.

Mathematically: if (P, P', {X_i}) is a valid reconstruction, then (PH⁻¹, P'H⁻¹, {HX_i}) gives the same images for any invertible 4×4 H. For calibrated cameras, H is restricted to a similarity transformation.

For calibrated cameras, reconstruction is ambiguous up to what type of transformation?

A similarity transformation (rotation + translation + scale) A projective transformation An affine transformation

Chapter 3: The Projective Reconstruction Theorem

This is one of the central results of the book:

Theorem (Projective Reconstruction): If a set of point correspondences x_i ↔ x'_i uniquely determines the fundamental matrix F, then the scene and cameras may be reconstructed from these correspondences alone. Any two such reconstructions are related by a 4×4 projective transformation H.

This means: from two uncalibrated photographs and enough point matches, you can recover 3D structure up to a projective warp. No camera calibration, no scene knowledge needed.

The proof relies on two facts: (1) F uniquely determines the pair of camera matrices up to a projective transformation (Chapter 9, Result 9.10), and (2) given the cameras, each point correspondence determines a unique 3D point (by triangulation), except for points on the baseline.

What determines whether projective reconstruction is achievable?

The correspondences must uniquely determine the fundamental matrix F The cameras must be calibrated The scene must contain at least one plane

Chapter 4: Upgrading to Affine

An affine reconstruction is one where the plane at infinity is correctly located. To upgrade from projective to affine, we need to identify the plane at infinity π_∞ in the projective reconstruction.

Given π_∞ as a 4-vector in the projective frame, the upgrading homography is:

H = [I | 0 ; π^T]

Ways to identify π_∞:

Information	How it identifies π_∞
Translational motion (no rotation)	Points at infinity map to themselves; match x_i = x'_i for any "invented" correspondence
3 sets of parallel lines	Each set intersects at a point on π_∞; three such points determine π_∞
Known distance ratios on lines	Vanishing points can be computed from ratio constraints

What affine reconstruction gives you: Parallel lines are parallel in the reconstruction. Midpoints and length ratios along lines are preserved. But angles between non-parallel lines and absolute distances are still unknown.

What must be identified to upgrade from projective to affine reconstruction?

The plane at infinity π_∞ The camera calibration K The absolute conic Ω_∞

Chapter 5: Upgrading to Metric

A metric (or Euclidean/similarity) reconstruction preserves angles and distance ratios. To upgrade from affine to metric, we need to identify the absolute conic Ω_∞ on the plane at infinity.

The absolute conic is identified via the image of the absolute conic (IAC) ω = (KK^T)⁻¹. If K is known (calibrated cameras), ω is known directly. If K is partially known (e.g., zero skew, square pixels), constraints on ω can be accumulated across views.

What metric reconstruction gives you: True 3D shape, up to a global scale, rotation, and translation. You can measure angles between lines, check orthogonality, and compute distance ratios between any pair of points. This is the gold standard of reconstruction.

The upgrading homography from affine to metric is a 3×3 matrix A applied in the 3D affine space. It is determined by the constraint that A^TA = Ω_∞ (the absolute conic must map to the identity under the metric transformation).

What must be identified to upgrade from affine to metric reconstruction?

The absolute conic Ω_∞ on the plane at infinity The plane at infinity π_∞ The fundamental matrix F

Chapter 6: Parallel Lines and Scene Constraints

Parallel lines are the most common source of affine information in man-made scenes. Three sets of parallel lines with different directions give three points on π_∞, which determines it uniquely.

Practical procedure: Identify parallel lines in both images. The imaged intersections (vanishing points) correspond to points at infinity. Reconstruct these 3D points via triangulation. Three non-collinear such points determine the plane at infinity.

It is not necessary to find vanishing points in both images. If you find a vanishing point v in one image and a corresponding line l' in the other, you can compute v' = l' ∩ Fv (the intersection of l' with the epipolar line of v).

Once the plane at infinity is located, the infinite homography H_∞ — the 2D homography induced by the plane at infinity — is also determined. This homography maps image points independently of scene depth and depends only on rotation and calibration.

How many sets of parallel lines (each with a different direction) are needed to determine the plane at infinity?

3 sets (giving 3 points on π_∞) 2 sets 5 sets

Chapter 7: Direct Reconstruction with Ground Truth

If some 3D coordinates are known in advance (ground truth), we can skip the stratified approach and directly compute the metric reconstruction.

Given a projective reconstruction (P, P', {X_i}) and a set of known 3D points X̄_i in Euclidean coordinates, the upgrading homography H satisfies X̄_i = H X_i for the known points. At least 5 known 3D points (in general position) determine H uniquely.

When is ground truth available? Surveyed control points in aerial photography. Measured calibration targets. GPS coordinates of landmarks. Known lengths of objects. Any of these can provide the constraints needed for a direct metric upgrade.

How many known 3D points are needed to directly compute the metric upgrading homography H?

5 points in general position 3 points 8 points

Chapter 8: The Reconstruction Hierarchy

The stratified approach is a powerful organizing principle: start with the weakest reconstruction and progressively strengthen it.

Level	Information needed	Invariants preserved
Projective	F (from correspondences alone)	Incidence, cross-ratios, collinearity
Affine	+ plane at infinity	+ parallelism, midpoints, volume ratios
Metric	+ absolute conic	+ angles, distance ratios
Euclidean	+ absolute scale	+ absolute distances

Each level is sufficient for different applications:
• Projective: line-plane intersections, image-to-image transfer
• Affine: checking parallelism, computing centroids
• Metric: 3D modeling, measurement, augmented reality

What geometric property is preserved by affine reconstruction but NOT by projective reconstruction?

Parallelism (parallel lines remain parallel) Collinearity (points on a line remain collinear) Angles between lines

Chapter 9: Connections

Chapter	Connection
Ch 11: Computing F	Practical algorithms for the first step of reconstruction
Ch 12: Triangulation	Robust methods for the third step
Ch 13: Homographies	Plane-induced homographies help identify π_∞
Ch 18: N-View Methods	Bundle adjustment refines reconstructions over many views
Ch 19: Auto-Calibration	Recovering K from multiple views enables metric upgrade without a calibration target

"Any two reconstructions from the same correspondences are projectively equivalent."

— Hartley & Zisserman, Theorem 10.1

What is the key insight of the projective reconstruction theorem?

3D structure can be recovered from uncalibrated images, up to a projective transformation, using only point correspondences Camera calibration is always needed for reconstruction Reconstruction requires at least 3 views

← Chapter 9 Chapter 11: Computing F →