Hartley & Zisserman, Chapter 12

Structure ComputationTriangulation

Given two camera matrices and a point correspondence, how to compute the 3D point. Linear methods, geometric error, Sampson correction, and optimal triangulation.

Prerequisites: Chapter 9 (Epipolar Geometry) + Chapter 11 (Computing F).
10
Chapters
4+
Simulations

Chapter 0: Why Triangulation?

You have two cameras and a pair of corresponding image points x ↔ x'. Back-projecting from x gives a ray through the first camera centre. Back-projecting from x' gives a ray through the second. In the noise-free case, these rays intersect at the 3D point X. In reality, the rays don't intersect because of measurement noise.

The core problem: Given noisy measurements x and x' that don't exactly satisfy the epipolar constraint, find the 3D point X that best explains the observations. This requires finding "corrected" measurements x̂, x̂' that do satisfy x̂'TF x̂ = 0, and then computing X from the corrected pair.
Triangulation

Two rays from camera centres C and C' should intersect at X, but noise makes them skew.

Why don't the two back-projected rays intersect in practice?

Chapter 1: Problem Statement

Given camera matrices P, P' and a correspondence x ↔ x', find X such that x = PX and x' = P'X. These 4 equations (2 per camera, since the third is dependent) in 3 unknowns (the inhomogeneous coordinates of X) are overdetermined.

x × PX = 0   ⇔   A X = 0

where A is a 4 × 4 matrix formed from the rows of P, P' and the coordinates of x, x'. The solution is the null-vector of A.

Two cost functions to choose from:
(1) Algebraic error: minimize ||AX|| — fast but not geometrically meaningful.
(2) Geometric error: minimize d(x, PX)2 + d(x', P'X)2 — the sum of squared reprojection distances. This is the MLE under Gaussian noise.
How many equations does the system x × PX = 0 give?

Chapter 2: Linear Triangulation

Form the 4 × 4 matrix A from the two camera equations:

A = [x(p3T) − p1T ; y(p3T) − p2T ; x'(p'3T) − p'1T ; y'(p'3T) − p'2T]

where piT is the i-th row of P. The solution X is the unit singular vector of A corresponding to the smallest singular value.

This is the DLT for triangulation. It minimizes algebraic error, which is not geometric error. The result depends on the coordinate system used for the image points. As always, normalize first for best results.

An alternative linear method: the mid-point method computes the point closest to both rays. But this method is not projectively invariant and can give poor results when the baseline is short relative to scene depth.

What is the main drawback of linear triangulation?

Chapter 3: Geometric Error

The MLE minimizes the sum of squared reprojection distances in both images:

minX d(x, PX)2 + d(x', P'X)2

This is a nonlinear optimization over the 3 unknowns of X. It can be solved using Levenberg-Marquardt with the linear triangulation result as initialization.

A subtlety: This cost function assumes the camera matrices P and P' are exact. But if they were estimated from noisy data, we should also consider errors in P and P'. The full Gold Standard would optimize over X and the corrected image points simultaneously. In practice, fixing P and P' and optimizing only X works well when the cameras were estimated from many points.
What does the geometric error cost function for triangulation minimize?

Chapter 4: Sampson Approximation

The Sampson correction is a first-order geometric correction that avoids the expense of full nonlinear optimization. It computes the correction to the image points that projects them onto the epipolar constraint surface.

Given measured points (x, x'), the Sampson-corrected points satisfy x̂'TF x̂ = 0, and are the closest such points to the measurements (to first order). The 3D point is then triangulated from the corrected pair.

When to use Sampson vs. full optimization: The Sampson approximation is excellent when the measurement noise is small relative to the size of the image. For most practical scenarios, it gives results indistinguishable from the full nonlinear optimization, at a fraction of the cost.
The Sampson approximation is accurate to what order?

Chapter 5: Optimal Triangulation

The optimal triangulation method computes the MLE of X without iterative optimization. It works by finding the corrected points x̂ and x̂' that minimize geometric error and exactly satisfy the epipolar constraint.

The key insight: the problem reduces to finding a root of a degree-6 polynomial. Evaluate the cost at each real root and select the minimum. This gives the globally optimal solution in closed form.

This is one of the few problems in multi-view geometry with a closed-form globally optimal solution. No iterative methods, no local minima, no initialization worries. The algorithm is fast and exact.
What makes optimal triangulation special compared to generic nonlinear optimization?

Chapter 6: Uncertainty of the 3D Point

The covariance of the estimated 3D point X depends on: (1) the noise in the image measurements, (2) the geometry of the cameras (baseline length, viewing angle), and (3) the depth of the point.

The uncertainty ellipsoid: The covariance of X is elongated along the viewing direction. Points far from the cameras have larger uncertainty than nearby points. A short baseline gives poor depth resolution (the uncertainty ellipsoid is very elongated along depth). A wider baseline gives better depth but worse matching (more foreshortening).

The covariance ΣX can be propagated from the image point covariances using the Jacobian of the triangulation mapping, following the methods of Chapter 5.

In which direction is the uncertainty of a triangulated 3D point typically largest?

Chapter 7: Line Reconstruction

Lines in 3D can be reconstructed from their images in two views. A line in one image back-projects to a plane. The intersection of two such planes (one from each view) gives the 3D line.

Two corresponding image lines l and l' back-project to planes π = PTl and π' = P'Tl'. Their intersection is the 3D line, represented as a Plucker matrix L = π π'T − π' πT.

Lines vs. points: Line reconstruction from two views is always possible and does not have the "skew rays" problem of point triangulation. Two planes always intersect in a line. However, the accuracy depends on how well-separated the two viewing directions are.
How are 3D lines reconstructed from two views?

Chapter 8: Multi-View Triangulation

With more than two views, triangulation becomes overdetermined. Given n views and camera matrices P1, ..., Pn with corresponding image points x1, ..., xn, we form a 2n × 4 matrix A and solve AX = 0.

More views = better accuracy. Each additional view adds 2 equations, improving the conditioning of the system. The geometric error to minimize becomes Σi d(xi, PiX)2. This can be solved by nonlinear optimization initialized with the linear DLT solution.

Multi-view triangulation naturally arises in the N-view reconstruction pipeline (Chapter 18), where points are tracked across many frames and triangulated using all available observations.

How does multi-view triangulation improve on two-view triangulation?

Chapter 9: Connections

LinkConnection
Ch 10 → Ch 12Triangulation is step 3 of the reconstruction pipeline
Ch 12 → Ch 13Homography-based triangulation works when points lie on known planes
Ch 12 → Ch 18Bundle adjustment simultaneously refines triangulated points and cameras
Ch 12 → Ch 14Affine triangulation is linear and exact (no iterative optimization needed)
"The optimal triangulation method finds the globally optimal solution by reducing the problem to finding roots of a polynomial."
— Hartley & Zisserman, Chapter 12
What degree polynomial must be solved for optimal triangulation?
← Chapter 11 Chapter 13: Homographies →