Hartley & Zisserman, Chapter 10

3D Reconstruction of Cameras & Structure

How to recover 3D scene geometry from two uncalibrated views. Projective reconstruction, the stratified upgrade to affine and metric, and the fundamental theorem.

Prerequisites: Chapter 9 (Epipolar Geometry).
10
Chapters
4+
Simulations

Chapter 0: Why Reconstruction?

Given two photographs of a scene, can we recover the 3D geometry — the shape of buildings, the positions of objects, the layout of a room? Yes, but with an important caveat: how much of the 3D structure we can recover depends on what we know about the cameras.

The hierarchy of reconstruction:
Projective (from F alone): Shape is recovered up to a projective transformation. Straight lines stay straight, but angles and distances are distorted.
Affine (+ knowing the plane at infinity): Parallel lines are parallel. Midpoints and ratios of lengths along lines are correct.
Metric/Euclidean (+ knowing the absolute conic): Angles and distance ratios are correct. The true 3D shape, up to a global scale and position.

This chapter shows how to achieve each level, and what information is needed to upgrade from one to the next.

What is the best level of reconstruction achievable from two uncalibrated views with no scene knowledge?

Chapter 1: The Reconstruction Method

The basic approach to two-view reconstruction has three steps:

StepActionResult
1Compute F from point correspondences xi ↔ x'iThe fundamental matrix
2Compute camera matrices P, P' from FCamera pair (up to projective ambiguity)
3For each correspondence, triangulate to find Xi3D point cloud
Triangulation: Given P, P' and a correspondence x ↔ x' satisfying x'TFx = 0, the rays back-projected from x and x' are coplanar and intersect in a 3D point X. This is triangulation — the subject of Chapter 12.

Points on the baseline (the line joining the two camera centres) cannot be triangulated, as both back-projected rays are the baseline itself. These points project to the epipoles in both images.

Which 3D points cannot be uniquely triangulated from two views?

Chapter 2: Reconstruction Ambiguity

Even with perfect data, reconstruction from images alone is inherently ambiguous. The level of ambiguity depends on what is known about the cameras.

Camera knowledgeAmbiguityDOF
Nothing (uncalibrated)Projective (4×4 matrix H)15
Calibrated (K known)Similarity (rotation + translation + scale)7
Calibrated + known scaleEuclidean (rotation + translation)6
Why is scale ambiguous? Consider a corridor photo. Is the corridor 2 metres wide or 10 centimetres (a doll's house)? Both produce identical images if the camera is scaled accordingly. Overall scale cannot be determined from images alone — you need at least one known distance in the scene.

Mathematically: if (P, P', {Xi}) is a valid reconstruction, then (PH−1, P'H−1, {HXi}) gives the same images for any invertible 4×4 H. For calibrated cameras, H is restricted to a similarity transformation.

For calibrated cameras, reconstruction is ambiguous up to what type of transformation?

Chapter 3: The Projective Reconstruction Theorem

This is one of the central results of the book:

Theorem (Projective Reconstruction): If a set of point correspondences xi ↔ x'i uniquely determines the fundamental matrix F, then the scene and cameras may be reconstructed from these correspondences alone. Any two such reconstructions are related by a 4×4 projective transformation H.

This means: from two uncalibrated photographs and enough point matches, you can recover 3D structure up to a projective warp. No camera calibration, no scene knowledge needed.

The proof relies on two facts: (1) F uniquely determines the pair of camera matrices up to a projective transformation (Chapter 9, Result 9.10), and (2) given the cameras, each point correspondence determines a unique 3D point (by triangulation), except for points on the baseline.

What determines whether projective reconstruction is achievable?

Chapter 4: Upgrading to Affine

An affine reconstruction is one where the plane at infinity is correctly located. To upgrade from projective to affine, we need to identify the plane at infinity π in the projective reconstruction.

Given π as a 4-vector in the projective frame, the upgrading homography is:

H = [I | 0 ; πT]

Ways to identify π:

InformationHow it identifies π
Translational motion (no rotation)Points at infinity map to themselves; match xi = x'i for any "invented" correspondence
3 sets of parallel linesEach set intersects at a point on π; three such points determine π
Known distance ratios on linesVanishing points can be computed from ratio constraints
What affine reconstruction gives you: Parallel lines are parallel in the reconstruction. Midpoints and length ratios along lines are preserved. But angles between non-parallel lines and absolute distances are still unknown.
What must be identified to upgrade from projective to affine reconstruction?

Chapter 5: Upgrading to Metric

A metric (or Euclidean/similarity) reconstruction preserves angles and distance ratios. To upgrade from affine to metric, we need to identify the absolute conic Ω on the plane at infinity.

The absolute conic is identified via the image of the absolute conic (IAC) ω = (KKT)−1. If K is known (calibrated cameras), ω is known directly. If K is partially known (e.g., zero skew, square pixels), constraints on ω can be accumulated across views.

What metric reconstruction gives you: True 3D shape, up to a global scale, rotation, and translation. You can measure angles between lines, check orthogonality, and compute distance ratios between any pair of points. This is the gold standard of reconstruction.

The upgrading homography from affine to metric is a 3×3 matrix A applied in the 3D affine space. It is determined by the constraint that ATA = Ω (the absolute conic must map to the identity under the metric transformation).

What must be identified to upgrade from affine to metric reconstruction?

Chapter 6: Parallel Lines and Scene Constraints

Parallel lines are the most common source of affine information in man-made scenes. Three sets of parallel lines with different directions give three points on π, which determines it uniquely.

Practical procedure: Identify parallel lines in both images. The imaged intersections (vanishing points) correspond to points at infinity. Reconstruct these 3D points via triangulation. Three non-collinear such points determine the plane at infinity.

It is not necessary to find vanishing points in both images. If you find a vanishing point v in one image and a corresponding line l' in the other, you can compute v' = l' ∩ Fv (the intersection of l' with the epipolar line of v).

Once the plane at infinity is located, the infinite homography H — the 2D homography induced by the plane at infinity — is also determined. This homography maps image points independently of scene depth and depends only on rotation and calibration.

How many sets of parallel lines (each with a different direction) are needed to determine the plane at infinity?

Chapter 7: Direct Reconstruction with Ground Truth

If some 3D coordinates are known in advance (ground truth), we can skip the stratified approach and directly compute the metric reconstruction.

Given a projective reconstruction (P, P', {Xi}) and a set of known 3D points X̄i in Euclidean coordinates, the upgrading homography H satisfies X̄i = H Xi for the known points. At least 5 known 3D points (in general position) determine H uniquely.

When is ground truth available? Surveyed control points in aerial photography. Measured calibration targets. GPS coordinates of landmarks. Known lengths of objects. Any of these can provide the constraints needed for a direct metric upgrade.
How many known 3D points are needed to directly compute the metric upgrading homography H?

Chapter 8: The Reconstruction Hierarchy

The stratified approach is a powerful organizing principle: start with the weakest reconstruction and progressively strengthen it.

LevelInformation neededInvariants preserved
ProjectiveF (from correspondences alone)Incidence, cross-ratios, collinearity
Affine+ plane at infinity+ parallelism, midpoints, volume ratios
Metric+ absolute conic+ angles, distance ratios
Euclidean+ absolute scale+ absolute distances
Each level is sufficient for different applications:
• Projective: line-plane intersections, image-to-image transfer
• Affine: checking parallelism, computing centroids
• Metric: 3D modeling, measurement, augmented reality
What geometric property is preserved by affine reconstruction but NOT by projective reconstruction?

Chapter 9: Connections

ChapterConnection
Ch 11: Computing FPractical algorithms for the first step of reconstruction
Ch 12: TriangulationRobust methods for the third step
Ch 13: HomographiesPlane-induced homographies help identify π
Ch 18: N-View MethodsBundle adjustment refines reconstructions over many views
Ch 19: Auto-CalibrationRecovering K from multiple views enables metric upgrade without a calibration target
"Any two reconstructions from the same correspondences are projectively equivalent."
— Hartley & Zisserman, Theorem 10.1
What is the key insight of the projective reconstruction theorem?
← Chapter 9 Chapter 11: Computing F →