How to recover 3D scene geometry from two uncalibrated views. Projective reconstruction, the stratified upgrade to affine and metric, and the fundamental theorem.
Given two photographs of a scene, can we recover the 3D geometry — the shape of buildings, the positions of objects, the layout of a room? Yes, but with an important caveat: how much of the 3D structure we can recover depends on what we know about the cameras.
This chapter shows how to achieve each level, and what information is needed to upgrade from one to the next.
The basic approach to two-view reconstruction has three steps:
| Step | Action | Result |
|---|---|---|
| 1 | Compute F from point correspondences xi ↔ x'i | The fundamental matrix |
| 2 | Compute camera matrices P, P' from F | Camera pair (up to projective ambiguity) |
| 3 | For each correspondence, triangulate to find Xi | 3D point cloud |
Points on the baseline (the line joining the two camera centres) cannot be triangulated, as both back-projected rays are the baseline itself. These points project to the epipoles in both images.
Even with perfect data, reconstruction from images alone is inherently ambiguous. The level of ambiguity depends on what is known about the cameras.
| Camera knowledge | Ambiguity | DOF |
|---|---|---|
| Nothing (uncalibrated) | Projective (4×4 matrix H) | 15 |
| Calibrated (K known) | Similarity (rotation + translation + scale) | 7 |
| Calibrated + known scale | Euclidean (rotation + translation) | 6 |
Mathematically: if (P, P', {Xi}) is a valid reconstruction, then (PH−1, P'H−1, {HXi}) gives the same images for any invertible 4×4 H. For calibrated cameras, H is restricted to a similarity transformation.
This is one of the central results of the book:
This means: from two uncalibrated photographs and enough point matches, you can recover 3D structure up to a projective warp. No camera calibration, no scene knowledge needed.
The proof relies on two facts: (1) F uniquely determines the pair of camera matrices up to a projective transformation (Chapter 9, Result 9.10), and (2) given the cameras, each point correspondence determines a unique 3D point (by triangulation), except for points on the baseline.
An affine reconstruction is one where the plane at infinity is correctly located. To upgrade from projective to affine, we need to identify the plane at infinity π∞ in the projective reconstruction.
Given π∞ as a 4-vector in the projective frame, the upgrading homography is:
Ways to identify π∞:
| Information | How it identifies π∞ |
|---|---|
| Translational motion (no rotation) | Points at infinity map to themselves; match xi = x'i for any "invented" correspondence |
| 3 sets of parallel lines | Each set intersects at a point on π∞; three such points determine π∞ |
| Known distance ratios on lines | Vanishing points can be computed from ratio constraints |
A metric (or Euclidean/similarity) reconstruction preserves angles and distance ratios. To upgrade from affine to metric, we need to identify the absolute conic Ω∞ on the plane at infinity.
The absolute conic is identified via the image of the absolute conic (IAC) ω = (KKT)−1. If K is known (calibrated cameras), ω is known directly. If K is partially known (e.g., zero skew, square pixels), constraints on ω can be accumulated across views.
The upgrading homography from affine to metric is a 3×3 matrix A applied in the 3D affine space. It is determined by the constraint that ATA = Ω∞ (the absolute conic must map to the identity under the metric transformation).
Parallel lines are the most common source of affine information in man-made scenes. Three sets of parallel lines with different directions give three points on π∞, which determines it uniquely.
It is not necessary to find vanishing points in both images. If you find a vanishing point v in one image and a corresponding line l' in the other, you can compute v' = l' ∩ Fv (the intersection of l' with the epipolar line of v).
Once the plane at infinity is located, the infinite homography H∞ — the 2D homography induced by the plane at infinity — is also determined. This homography maps image points independently of scene depth and depends only on rotation and calibration.
If some 3D coordinates are known in advance (ground truth), we can skip the stratified approach and directly compute the metric reconstruction.
Given a projective reconstruction (P, P', {Xi}) and a set of known 3D points X̄i in Euclidean coordinates, the upgrading homography H satisfies X̄i = H Xi for the known points. At least 5 known 3D points (in general position) determine H uniquely.
The stratified approach is a powerful organizing principle: start with the weakest reconstruction and progressively strengthen it.
| Level | Information needed | Invariants preserved |
|---|---|---|
| Projective | F (from correspondences alone) | Incidence, cross-ratios, collinearity |
| Affine | + plane at infinity | + parallelism, midpoints, volume ratios |
| Metric | + absolute conic | + angles, distance ratios |
| Euclidean | + absolute scale | + absolute distances |
| Chapter | Connection |
|---|---|
| Ch 11: Computing F | Practical algorithms for the first step of reconstruction |
| Ch 12: Triangulation | Robust methods for the third step |
| Ch 13: Homographies | Plane-induced homographies help identify π∞ |
| Ch 18: N-View Methods | Bundle adjustment refines reconstructions over many views |
| Ch 19: Auto-Calibration | Recovering K from multiple views enables metric upgrade without a calibration target |