Hartley & Zisserman, Chapter 12

Structure Computation — Triangulation

Given two camera matrices and a point correspondence, how to compute the 3D point. Linear methods, geometric error, Sampson correction, and optimal triangulation.

Prerequisites: Chapter 9 (Epipolar Geometry) + Chapter 11 (Computing F).

Chapters

Simulations

Chapter 0: Why Triangulation?

You have two cameras and a pair of corresponding image points x ↔ x'. Back-projecting from x gives a ray through the first camera centre. Back-projecting from x' gives a ray through the second. In the noise-free case, these rays intersect at the 3D point X. In reality, the rays don't intersect because of measurement noise.

The core problem: Given noisy measurements x and x' that don't exactly satisfy the epipolar constraint, find the 3D point X that best explains the observations. This requires finding "corrected" measurements x̂, x̂' that do satisfy x̂'^TF x̂ = 0, and then computing X from the corrected pair.

Triangulation

Two rays from camera centres C and C' should intersect at X, but noise makes them skew.

Why don't the two back-projected rays intersect in practice?

Because of measurement noise: the image points don't exactly satisfy the epipolar constraint Because the cameras are too far apart Because the fundamental matrix is wrong

Chapter 1: Problem Statement

Given camera matrices P, P' and a correspondence x ↔ x', find X such that x = PX and x' = P'X. These 4 equations (2 per camera, since the third is dependent) in 3 unknowns (the inhomogeneous coordinates of X) are overdetermined.

x × PX = 0 ⇔ A X = 0

where A is a 4 × 4 matrix formed from the rows of P, P' and the coordinates of x, x'. The solution is the null-vector of A.

Two cost functions to choose from:
(1) Algebraic error: minimize ||AX|| — fast but not geometrically meaningful.
(2) Geometric error: minimize d(x, PX)² + d(x', P'X)² — the sum of squared reprojection distances. This is the MLE under Gaussian noise.

How many equations does the system x × PX = 0 give?

2 independent equations per image (cross product gives 3, but only 2 are independent) 2 independent equations per image 3 independent equations per image

Chapter 2: Linear Triangulation

Form the 4 × 4 matrix A from the two camera equations:

A = [x(p^3T) − p^1T ; y(p^3T) − p^2T ; x'(p'^3T) − p'^1T ; y'(p'^3T) − p'^2T]

where p^iT is the i-th row of P. The solution X is the unit singular vector of A corresponding to the smallest singular value.

This is the DLT for triangulation. It minimizes algebraic error, which is not geometric error. The result depends on the coordinate system used for the image points. As always, normalize first for best results.

An alternative linear method: the mid-point method computes the point closest to both rays. But this method is not projectively invariant and can give poor results when the baseline is short relative to scene depth.

What is the main drawback of linear triangulation?

It minimizes algebraic error, not geometric error, and the result depends on the coordinate system It requires too many points It cannot handle projective cameras

Chapter 3: Geometric Error

The MLE minimizes the sum of squared reprojection distances in both images:

min_X d(x, PX)² + d(x', P'X)²

This is a nonlinear optimization over the 3 unknowns of X. It can be solved using Levenberg-Marquardt with the linear triangulation result as initialization.

A subtlety: This cost function assumes the camera matrices P and P' are exact. But if they were estimated from noisy data, we should also consider errors in P and P'. The full Gold Standard would optimize over X and the corrected image points simultaneously. In practice, fixing P and P' and optimizing only X works well when the cameras were estimated from many points.

What does the geometric error cost function for triangulation minimize?

The sum of squared reprojection distances in both images: d(x, PX)² + d(x', P'X)² The distance from X to the baseline The algebraic error ||AX||

Chapter 4: Sampson Approximation

The Sampson correction is a first-order geometric correction that avoids the expense of full nonlinear optimization. It computes the correction to the image points that projects them onto the epipolar constraint surface.

Given measured points (x, x'), the Sampson-corrected points satisfy x̂'^TF x̂ = 0, and are the closest such points to the measurements (to first order). The 3D point is then triangulated from the corrected pair.

When to use Sampson vs. full optimization: The Sampson approximation is excellent when the measurement noise is small relative to the size of the image. For most practical scenarios, it gives results indistinguishable from the full nonlinear optimization, at a fraction of the cost.

The Sampson approximation is accurate to what order?

First order (a linear correction to the measurements) Exact Second order

Chapter 5: Optimal Triangulation

The optimal triangulation method computes the MLE of X without iterative optimization. It works by finding the corrected points x̂ and x̂' that minimize geometric error and exactly satisfy the epipolar constraint.

The key insight: the problem reduces to finding a root of a degree-6 polynomial. Evaluate the cost at each real root and select the minimum. This gives the globally optimal solution in closed form.

This is one of the few problems in multi-view geometry with a closed-form globally optimal solution. No iterative methods, no local minima, no initialization worries. The algorithm is fast and exact.

What makes optimal triangulation special compared to generic nonlinear optimization?

It finds the globally optimal solution in closed form by solving a degree-6 polynomial It uses a different cost function It requires more image points

Chapter 6: Uncertainty of the 3D Point

The covariance of the estimated 3D point X depends on: (1) the noise in the image measurements, (2) the geometry of the cameras (baseline length, viewing angle), and (3) the depth of the point.

The uncertainty ellipsoid: The covariance of X is elongated along the viewing direction. Points far from the cameras have larger uncertainty than nearby points. A short baseline gives poor depth resolution (the uncertainty ellipsoid is very elongated along depth). A wider baseline gives better depth but worse matching (more foreshortening).

The covariance Σ_X can be propagated from the image point covariances using the Jacobian of the triangulation mapping, following the methods of Chapter 5.

In which direction is the uncertainty of a triangulated 3D point typically largest?

Along the viewing/depth direction (elongated towards/away from the cameras) Perpendicular to the baseline Equally in all directions (spherical uncertainty)

Chapter 7: Line Reconstruction

Lines in 3D can be reconstructed from their images in two views. A line in one image back-projects to a plane. The intersection of two such planes (one from each view) gives the 3D line.

Two corresponding image lines l and l' back-project to planes π = P^Tl and π' = P'^Tl'. Their intersection is the 3D line, represented as a Plucker matrix L = π π'^T − π' π^T.

Lines vs. points: Line reconstruction from two views is always possible and does not have the "skew rays" problem of point triangulation. Two planes always intersect in a line. However, the accuracy depends on how well-separated the two viewing directions are.

How are 3D lines reconstructed from two views?

By intersecting the back-projected planes from each image line By triangulating the endpoints Lines cannot be reconstructed from two views

Chapter 8: Multi-View Triangulation

With more than two views, triangulation becomes overdetermined. Given n views and camera matrices P₁, ..., P_n with corresponding image points x₁, ..., x_n, we form a 2n × 4 matrix A and solve AX = 0.

More views = better accuracy. Each additional view adds 2 equations, improving the conditioning of the system. The geometric error to minimize becomes Σ_i d(x_i, P_iX)². This can be solved by nonlinear optimization initialized with the linear DLT solution.

Multi-view triangulation naturally arises in the N-view reconstruction pipeline (Chapter 18), where points are tracked across many frames and triangulated using all available observations.

How does multi-view triangulation improve on two-view triangulation?

More views add more equations, improving accuracy and robustness It uses a different algorithm entirely It eliminates the need for camera matrices

Chapter 9: Connections

Link	Connection
Ch 10 → Ch 12	Triangulation is step 3 of the reconstruction pipeline
Ch 12 → Ch 13	Homography-based triangulation works when points lie on known planes
Ch 12 → Ch 18	Bundle adjustment simultaneously refines triangulated points and cameras
Ch 12 → Ch 14	Affine triangulation is linear and exact (no iterative optimization needed)

"The optimal triangulation method finds the globally optimal solution by reducing the problem to finding roots of a polynomial."

— Hartley & Zisserman, Chapter 12

What degree polynomial must be solved for optimal triangulation?

Degree 6 Degree 3 Degree 10

← Chapter 11 Chapter 13: Homographies →