Hartley & Zisserman, Chapter 4

Estimation —
2D Projective Transformations

How to compute a homography from noisy point correspondences. DLT, normalization, cost functions, Maximum Likelihood, and the RANSAC algorithm.

Prerequisites: Chapter 2 (Projective 2D) + Basic statistics (mean, variance).
10
Chapters
5+
Simulations

Chapter 0: Why Estimation Matters

Chapters 2 and 3 gave us the beautiful geometry: homogeneous coordinates, transformations, invariants. But in practice, we never have exact data. Points are localized with pixel-level noise. Correspondences may be completely wrong (outliers). How do we compute a homography from messy real-world measurements?

This chapter introduces the core estimation toolkit that recurs throughout the book: the DLT algorithm (fast, linear, approximate), Maximum Likelihood estimation (statistically optimal), and RANSAC (robust to outliers).

The estimation pipeline: Every estimation problem in this book follows the same pattern: (1) set up linear equations, (2) solve with DLT, (3) refine with nonlinear optimization to minimize geometric error, (4) handle outliers with RANSAC.
Noisy Point Correspondences

True correspondences are corrupted by noise. The goal is to estimate the transformation despite the errors. Increase noise to see the challenge.

Noise σ5
Why can't we just solve Hx = x' exactly from point correspondences?

Chapter 1: The DLT Algorithm

The Direct Linear Transformation is the workhorse starting point. Given n ≥ 4 point correspondences xi ↔ x'i, we want a 3×3 matrix H such that x'i = Hxi (up to scale).

The trick: rewrite x' × Hx = 0 (the cross product is zero when vectors are parallel). Each correspondence gives 2 linearly independent equations on the 9 entries of H. Stacking all equations gives Ah = 0, where h is H reshaped as a 9-vector.

A2n×9 h = 0   ⇒   h = smallest singular vector of A
The key insight: We solve Ah = 0 by finding the null space of A via the SVD. The solution h is the right singular vector corresponding to the smallest singular value. With 4 correspondences we get exactly 8 equations for 8 unknowns (H has 8 DOF). With more correspondences, we minimize ||Ah|| — a least-squares solution.

The DLT is fast and requires no initialization, but it minimizes algebraic error (||Ah||), which has no direct geometric meaning. We'll fix this in the next chapters.

How many point correspondences are needed at minimum to compute a 2D homography?

Chapter 2: Cost Functions

Different cost functions measure different kinds of error. Choosing the right one is critical for getting good results from estimation.

Cost FunctionMeasuresProsCons
Algebraic||Ah||Linear, fast (SVD)No geometric meaning
Geometric (one image)Σ d(x'i, HxiMeaningful distance in imageNon-linear optimization
Symmetric transferΣ d(xi, H−1x'i)² + d(x'i, HxiSymmetric, better conditionedNon-linear
ReprojectionΣ d(xi, x̂i)² + d(x'i, x̂'iOptimal (Gold Standard)Slower
Geometric vs algebraic error: Algebraic error ||Ah|| is fast to compute but may give poor results when the data has anisotropic noise or poor conditioning. Geometric error d(x', Hx) measures actual distance in pixels — what we really care about.

The Sampson error provides a first-order approximation to geometric error that is almost as fast as algebraic error. It is the ratio of algebraic error to the Jacobian norm and gives excellent results in practice.

Which cost function has direct geometric meaning (distance in pixels)?

Chapter 3: Maximum Likelihood Estimation

If the noise on measured point locations is Gaussian with known covariance, then the Maximum Likelihood Estimate (MLE) of H is the one that maximizes the probability of observing the data. Under Gaussian noise, this is equivalent to minimizing the Mahalanobis distance:

Σi d(xi, x̂iΣ−1 + d(x'i, x̂'iΣ'−1

When the noise is isotropic and identical for all points (Σ = σ2I), the MLE reduces to minimizing the sum of squared reprojection errors. This is the Gold Standard algorithm.

Gold Standard = MLE under isotropic Gaussian noise. It jointly estimates H and the "true" point locations x̂i, x̂'i that minimize total squared distance from the measurements. It requires non-linear optimization (Levenberg-Marquardt), initialized from the DLT.
1. Initialize
Run DLT to get an initial estimate of H
2. Minimize
Levenberg-Marquardt on geometric error
3. Output
Optimal H and estimated true point positions
Under isotropic Gaussian noise, the MLE of H minimizes what?

Chapter 4: Normalization

A critical practical issue: the DLT algorithm is not invariant to the choice of coordinate system. If image coordinates range from 0 to 1000 pixels, the matrix A in Ah = 0 is poorly conditioned, and the SVD gives bad results.

The fix is normalization: before running DLT, apply a transformation T to the source points and T' to the destination points so that the centroid is at the origin and the average distance from the origin is √2. Then compute H on the normalized points and recover the original H as H = T'−1 Ĥ T.

The single most important practical recommendation in this book: Always normalize your point coordinates before running any linear estimation algorithm (DLT, 8-point, etc). The improvement is dramatic — often an order of magnitude.
i = Txi   |   centroid = 0, avg distance = √2   |   H = T'−1 Ĥ T

Geometric error (unlike algebraic error) is invariant to similarity transforms of the coordinates. This is another reason to prefer geometric error when possible — no normalization needed.

Why is normalization critical for the DLT algorithm?

Chapter 5: Iterative Minimization

Minimizing geometric error is a nonlinear optimization problem. The workhorse algorithm is Levenberg-Marquardt (LM), which interpolates between gradient descent (slow but safe) and Gauss-Newton (fast but may diverge).

LM needs a good initialization — that's what the DLT provides. It also needs a parameterization of H that avoids redundancy. Since H has 8 DOF but 9 entries, we can either fix one entry (e.g., h33 = 1) or use a special parameterization.

Convergence: Levenberg-Marquardt converges to a local minimum. For homography estimation, the error landscape is usually well-behaved and the DLT initialization is good enough that LM finds the global minimum. For more complex problems (bundle adjustment), convergence is less guaranteed.
What optimization algorithm is used for minimizing geometric error?

Chapter 6: RANSAC

All the methods so far assume every correspondence is correct (an inlier). In practice, especially with automatic feature matching, many correspondences are completely wrong (outliers). A single outlier can catastrophically corrupt a least-squares estimate.

RANSAC (RANdom SAmple Consensus) handles this elegantly: (1) randomly pick a minimal set of correspondences (4 for a homography), (2) compute H from just those, (3) count how many other correspondences agree with this H (inliers), (4) repeat many times, (5) keep the H with the most inliers.

The RANSAC loop:
1. Sample 4 random correspondences
2. Compute H from these 4 points (exact, using DLT)
3. For every other correspondence, check if d(x', Hx) < threshold
4. Count inliers. If more than best so far, save this H
5. Repeat for N iterations
6. Re-estimate H using all inliers of the best model
RANSAC in Action

Inliers follow the true model. Outliers are random. RANSAC finds the model despite the outliers. Click Run to watch iterations.

Click Run
What is the minimum number of correspondences RANSAC samples per iteration for a homography?

Chapter 7: Robust Estimation

RANSAC gives a hard inlier/outlier classification. A more nuanced approach uses robust cost functions that down-weight outliers smoothly rather than rejecting them entirely.

The Huber cost function is quadratic for small errors (like least-squares) but linear for large errors (reducing outlier influence). The Tukey biweight goes further, giving zero weight to errors beyond a threshold.

RANSAC + robust refinement: The best practical approach combines RANSAC (to get a rough inlier set and initial model) with robust M-estimation (to refine the model while smoothly handling borderline correspondences).

The number of RANSAC iterations needed depends on the outlier ratio ε and the desired confidence p: N = log(1 − p) / log(1 − (1 − ε)s), where s is the sample size. With 50% outliers and 99% confidence, you need about 72 iterations for a homography (s = 4).

Why might you prefer a robust cost function over hard RANSAC inlier/outlier classification?

Chapter 8: Automatic Homography Computation

Putting it all together: a fully automatic system for computing a homography between two images of a planar scene.

1. Feature Detection
Find interest points (Harris corners, SIFT, etc.)
2. Feature Matching
Match features by descriptor similarity. Many matches will be wrong.
3. RANSAC
Robustly estimate H, classify inliers/outliers
4. Refinement
Re-estimate H from all inliers using MLE (Gold Standard)
5. Guided Matching
Use estimated H to find more correspondences, re-estimate
Guided matching: After an initial H estimate, search for correspondences only along the predicted locations (within a small radius). This finds matches that were missed initially and further improves the estimate.
What is the purpose of guided matching after initial RANSAC estimation?

Chapter 9: Connections

This chapter established the estimation toolkit that is used throughout the rest of the book. The same patterns — DLT initialization, geometric error minimization, RANSAC for robustness — apply to every estimation problem:

ProblemMatrixDOFMin Correspondences
2D HomographyH (3×3)84 points
Camera matrixP (3×4)116 points (3D↔2D)
Fundamental matrixF (3×3, rank 2)77 points
Trifocal tensorT (3×3×3)187 points in 3 views
"In almost every case, the normalized version of any algorithm gives superior results to the un-normalized version."
— Hartley & Zisserman, Chapter 4
What is the standard estimation pipeline used throughout the book?
← Chapter 3 Chapter 5: Evaluation →