How to compute a homography from noisy point correspondences. DLT, normalization, cost functions, Maximum Likelihood, and the RANSAC algorithm.
Chapters 2 and 3 gave us the beautiful geometry: homogeneous coordinates, transformations, invariants. But in practice, we never have exact data. Points are localized with pixel-level noise. Correspondences may be completely wrong (outliers). How do we compute a homography from messy real-world measurements?
This chapter introduces the core estimation toolkit that recurs throughout the book: the DLT algorithm (fast, linear, approximate), Maximum Likelihood estimation (statistically optimal), and RANSAC (robust to outliers).
True correspondences are corrupted by noise. The goal is to estimate the transformation despite the errors. Increase noise to see the challenge.
The Direct Linear Transformation is the workhorse starting point. Given n ≥ 4 point correspondences xi ↔ x'i, we want a 3×3 matrix H such that x'i = Hxi (up to scale).
The trick: rewrite x' × Hx = 0 (the cross product is zero when vectors are parallel). Each correspondence gives 2 linearly independent equations on the 9 entries of H. Stacking all equations gives Ah = 0, where h is H reshaped as a 9-vector.
The DLT is fast and requires no initialization, but it minimizes algebraic error (||Ah||), which has no direct geometric meaning. We'll fix this in the next chapters.
Different cost functions measure different kinds of error. Choosing the right one is critical for getting good results from estimation.
| Cost Function | Measures | Pros | Cons |
|---|---|---|---|
| Algebraic | ||Ah|| | Linear, fast (SVD) | No geometric meaning |
| Geometric (one image) | Σ d(x'i, Hxi)² | Meaningful distance in image | Non-linear optimization |
| Symmetric transfer | Σ d(xi, H−1x'i)² + d(x'i, Hxi)² | Symmetric, better conditioned | Non-linear |
| Reprojection | Σ d(xi, x̂i)² + d(x'i, x̂'i)² | Optimal (Gold Standard) | Slower |
The Sampson error provides a first-order approximation to geometric error that is almost as fast as algebraic error. It is the ratio of algebraic error to the Jacobian norm and gives excellent results in practice.
If the noise on measured point locations is Gaussian with known covariance, then the Maximum Likelihood Estimate (MLE) of H is the one that maximizes the probability of observing the data. Under Gaussian noise, this is equivalent to minimizing the Mahalanobis distance:
When the noise is isotropic and identical for all points (Σ = σ2I), the MLE reduces to minimizing the sum of squared reprojection errors. This is the Gold Standard algorithm.
A critical practical issue: the DLT algorithm is not invariant to the choice of coordinate system. If image coordinates range from 0 to 1000 pixels, the matrix A in Ah = 0 is poorly conditioned, and the SVD gives bad results.
The fix is normalization: before running DLT, apply a transformation T to the source points and T' to the destination points so that the centroid is at the origin and the average distance from the origin is √2. Then compute H on the normalized points and recover the original H as H = T'−1 Ĥ T.
Geometric error (unlike algebraic error) is invariant to similarity transforms of the coordinates. This is another reason to prefer geometric error when possible — no normalization needed.
Minimizing geometric error is a nonlinear optimization problem. The workhorse algorithm is Levenberg-Marquardt (LM), which interpolates between gradient descent (slow but safe) and Gauss-Newton (fast but may diverge).
LM needs a good initialization — that's what the DLT provides. It also needs a parameterization of H that avoids redundancy. Since H has 8 DOF but 9 entries, we can either fix one entry (e.g., h33 = 1) or use a special parameterization.
All the methods so far assume every correspondence is correct (an inlier). In practice, especially with automatic feature matching, many correspondences are completely wrong (outliers). A single outlier can catastrophically corrupt a least-squares estimate.
RANSAC (RANdom SAmple Consensus) handles this elegantly: (1) randomly pick a minimal set of correspondences (4 for a homography), (2) compute H from just those, (3) count how many other correspondences agree with this H (inliers), (4) repeat many times, (5) keep the H with the most inliers.
Inliers follow the true model. Outliers are random. RANSAC finds the model despite the outliers. Click Run to watch iterations.
RANSAC gives a hard inlier/outlier classification. A more nuanced approach uses robust cost functions that down-weight outliers smoothly rather than rejecting them entirely.
The Huber cost function is quadratic for small errors (like least-squares) but linear for large errors (reducing outlier influence). The Tukey biweight goes further, giving zero weight to errors beyond a threshold.
The number of RANSAC iterations needed depends on the outlier ratio ε and the desired confidence p: N = log(1 − p) / log(1 − (1 − ε)s), where s is the sample size. With 50% outliers and 99% confidence, you need about 72 iterations for a homography (s = 4).
Putting it all together: a fully automatic system for computing a homography between two images of a planar scene.
This chapter established the estimation toolkit that is used throughout the rest of the book. The same patterns — DLT initialization, geometric error minimization, RANSAC for robustness — apply to every estimation problem:
| Problem | Matrix | DOF | Min Correspondences |
|---|---|---|---|
| 2D Homography | H (3×3) | 8 | 4 points |
| Camera matrix | P (3×4) | 11 | 6 points (3D↔2D) |
| Fundamental matrix | F (3×3, rank 2) | 7 | 7 points |
| Trifocal tensor | T (3×3×3) | 18 | 7 points in 3 views |