Given two camera matrices and a point correspondence, how to compute the 3D point. Linear methods, geometric error, Sampson correction, and optimal triangulation.
You have two cameras and a pair of corresponding image points x ↔ x'. Back-projecting from x gives a ray through the first camera centre. Back-projecting from x' gives a ray through the second. In the noise-free case, these rays intersect at the 3D point X. In reality, the rays don't intersect because of measurement noise.
Two rays from camera centres C and C' should intersect at X, but noise makes them skew.
Given camera matrices P, P' and a correspondence x ↔ x', find X such that x = PX and x' = P'X. These 4 equations (2 per camera, since the third is dependent) in 3 unknowns (the inhomogeneous coordinates of X) are overdetermined.
where A is a 4 × 4 matrix formed from the rows of P, P' and the coordinates of x, x'. The solution is the null-vector of A.
Form the 4 × 4 matrix A from the two camera equations:
where piT is the i-th row of P. The solution X is the unit singular vector of A corresponding to the smallest singular value.
An alternative linear method: the mid-point method computes the point closest to both rays. But this method is not projectively invariant and can give poor results when the baseline is short relative to scene depth.
The MLE minimizes the sum of squared reprojection distances in both images:
This is a nonlinear optimization over the 3 unknowns of X. It can be solved using Levenberg-Marquardt with the linear triangulation result as initialization.
The Sampson correction is a first-order geometric correction that avoids the expense of full nonlinear optimization. It computes the correction to the image points that projects them onto the epipolar constraint surface.
Given measured points (x, x'), the Sampson-corrected points satisfy x̂'TF x̂ = 0, and are the closest such points to the measurements (to first order). The 3D point is then triangulated from the corrected pair.
The optimal triangulation method computes the MLE of X without iterative optimization. It works by finding the corrected points x̂ and x̂' that minimize geometric error and exactly satisfy the epipolar constraint.
The key insight: the problem reduces to finding a root of a degree-6 polynomial. Evaluate the cost at each real root and select the minimum. This gives the globally optimal solution in closed form.
The covariance of the estimated 3D point X depends on: (1) the noise in the image measurements, (2) the geometry of the cameras (baseline length, viewing angle), and (3) the depth of the point.
The covariance ΣX can be propagated from the image point covariances using the Jacobian of the triangulation mapping, following the methods of Chapter 5.
Lines in 3D can be reconstructed from their images in two views. A line in one image back-projects to a plane. The intersection of two such planes (one from each view) gives the 3D line.
Two corresponding image lines l and l' back-project to planes π = PTl and π' = P'Tl'. Their intersection is the 3D line, represented as a Plucker matrix L = π π'T − π' πT.
With more than two views, triangulation becomes overdetermined. Given n views and camera matrices P1, ..., Pn with corresponding image points x1, ..., xn, we form a 2n × 4 matrix A and solve AX = 0.
Multi-view triangulation naturally arises in the N-view reconstruction pipeline (Chapter 18), where points are tracked across many frames and triangulated using all available observations.
| Link | Connection |
|---|---|
| Ch 10 → Ch 12 | Triangulation is step 3 of the reconstruction pipeline |
| Ch 12 → Ch 13 | Homography-based triangulation works when points lie on known planes |
| Ch 12 → Ch 18 | Bundle adjustment simultaneously refines triangulated points and cameras |
| Ch 12 → Ch 14 | Affine triangulation is linear and exact (no iterative optimization needed) |