Hartley & Zisserman, Chapter 4

Estimation —
2D Projective Transformations

How to compute a homography from noisy point correspondences. DLT, normalization, cost functions, Maximum Likelihood, and the RANSAC algorithm.

Prerequisites: Chapter 2 (Projective 2D) + Basic statistics (mean, variance).

Chapters

Simulations

Chapter 0: Why Estimation Matters

Chapters 2 and 3 gave us the beautiful geometry: homogeneous coordinates, transformations, invariants. But in practice, we never have exact data. Points are localized with pixel-level noise. Correspondences may be completely wrong (outliers). How do we compute a homography from messy real-world measurements?

This chapter introduces the core estimation toolkit that recurs throughout the book: the DLT algorithm (fast, linear, approximate), Maximum Likelihood estimation (statistically optimal), and RANSAC (robust to outliers).

The estimation pipeline: Every estimation problem in this book follows the same pattern: (1) set up linear equations, (2) solve with DLT, (3) refine with nonlinear optimization to minimize geometric error, (4) handle outliers with RANSAC.

Noisy Point Correspondences

True correspondences are corrupted by noise. The goal is to estimate the transformation despite the errors. Increase noise to see the challenge.

Noise σ5

Why can't we just solve Hx = x' exactly from point correspondences?

Because measurements are always noisy — there is no H that fits all points exactly Because H is too small a matrix Because homogeneous coordinates are ambiguous

Chapter 1: The DLT Algorithm

The Direct Linear Transformation is the workhorse starting point. Given n ≥ 4 point correspondences x_i ↔ x'_i, we want a 3×3 matrix H such that x'_i = Hx_i (up to scale).

The trick: rewrite x' × Hx = 0 (the cross product is zero when vectors are parallel). Each correspondence gives 2 linearly independent equations on the 9 entries of H. Stacking all equations gives Ah = 0, where h is H reshaped as a 9-vector.

A_2n×9 h = 0 ⇒ h = smallest singular vector of A

The key insight: We solve Ah = 0 by finding the null space of A via the SVD. The solution h is the right singular vector corresponding to the smallest singular value. With 4 correspondences we get exactly 8 equations for 8 unknowns (H has 8 DOF). With more correspondences, we minimize ||Ah|| — a least-squares solution.

The DLT is fast and requires no initialization, but it minimizes algebraic error (||Ah||), which has no direct geometric meaning. We'll fix this in the next chapters.

How many point correspondences are needed at minimum to compute a 2D homography?

4 (each gives 2 equations, 8 equations for 8 DOF) 3 8

Chapter 2: Cost Functions

Different cost functions measure different kinds of error. Choosing the right one is critical for getting good results from estimation.

Cost Function	Measures	Pros	Cons
Algebraic	\|\|Ah\|\|	Linear, fast (SVD)	No geometric meaning
Geometric (one image)	Σ d(x'_i, Hx_i)²	Meaningful distance in image	Non-linear optimization
Symmetric transfer	Σ d(x_i, H⁻¹x'_i)² + d(x'_i, Hx_i)²	Symmetric, better conditioned	Non-linear
Reprojection	Σ d(x_i, x̂_i)² + d(x'_i, x̂'_i)²	Optimal (Gold Standard)	Slower

Geometric vs algebraic error: Algebraic error ||Ah|| is fast to compute but may give poor results when the data has anisotropic noise or poor conditioning. Geometric error d(x', Hx) measures actual distance in pixels — what we really care about.

The Sampson error provides a first-order approximation to geometric error that is almost as fast as algebraic error. It is the ratio of algebraic error to the Jacobian norm and gives excellent results in practice.

Which cost function has direct geometric meaning (distance in pixels)?

Algebraic error ||Ah|| Geometric transfer error d(x', Hx) The determinant of H

Chapter 3: Maximum Likelihood Estimation

If the noise on measured point locations is Gaussian with known covariance, then the Maximum Likelihood Estimate (MLE) of H is the one that maximizes the probability of observing the data. Under Gaussian noise, this is equivalent to minimizing the Mahalanobis distance:

Σ_i d(x_i, x̂_i)²_Σ⁻¹ + d(x'_i, x̂'_i)²_Σ'⁻¹

When the noise is isotropic and identical for all points (Σ = σ²I), the MLE reduces to minimizing the sum of squared reprojection errors. This is the Gold Standard algorithm.

Gold Standard = MLE under isotropic Gaussian noise. It jointly estimates H and the "true" point locations x̂_i, x̂'_i that minimize total squared distance from the measurements. It requires non-linear optimization (Levenberg-Marquardt), initialized from the DLT.

1. Initialize

Run DLT to get an initial estimate of H

↓

2. Minimize

Levenberg-Marquardt on geometric error

↓

3. Output

Optimal H and estimated true point positions

Under isotropic Gaussian noise, the MLE of H minimizes what?

The sum of squared reprojection errors (geometric error) The algebraic error ||Ah|| The determinant of H

Chapter 4: Normalization

A critical practical issue: the DLT algorithm is not invariant to the choice of coordinate system. If image coordinates range from 0 to 1000 pixels, the matrix A in Ah = 0 is poorly conditioned, and the SVD gives bad results.

The fix is normalization: before running DLT, apply a transformation T to the source points and T' to the destination points so that the centroid is at the origin and the average distance from the origin is √2. Then compute H on the normalized points and recover the original H as H = T'⁻¹ Ĥ T.

The single most important practical recommendation in this book: Always normalize your point coordinates before running any linear estimation algorithm (DLT, 8-point, etc). The improvement is dramatic — often an order of magnitude.

x̂_i = Tx_i | centroid = 0, avg distance = √2 | H = T'⁻¹ Ĥ T

Geometric error (unlike algebraic error) is invariant to similarity transforms of the coordinates. This is another reason to prefer geometric error when possible — no normalization needed.

Why is normalization critical for the DLT algorithm?

Because DLT is not invariant to coordinate scaling — un-normalized data leads to poor conditioning Because it makes the homography matrix smaller Because normalized points are always integer-valued

Chapter 5: Iterative Minimization

Minimizing geometric error is a nonlinear optimization problem. The workhorse algorithm is Levenberg-Marquardt (LM), which interpolates between gradient descent (slow but safe) and Gauss-Newton (fast but may diverge).

LM needs a good initialization — that's what the DLT provides. It also needs a parameterization of H that avoids redundancy. Since H has 8 DOF but 9 entries, we can either fix one entry (e.g., h₃₃ = 1) or use a special parameterization.

Convergence: Levenberg-Marquardt converges to a local minimum. For homography estimation, the error landscape is usually well-behaved and the DLT initialization is good enough that LM finds the global minimum. For more complex problems (bundle adjustment), convergence is less guaranteed.

What optimization algorithm is used for minimizing geometric error?

SVD Levenberg-Marquardt (initialized from DLT) Random search

Chapter 6: RANSAC

All the methods so far assume every correspondence is correct (an inlier). In practice, especially with automatic feature matching, many correspondences are completely wrong (outliers). A single outlier can catastrophically corrupt a least-squares estimate.

RANSAC (RANdom SAmple Consensus) handles this elegantly: (1) randomly pick a minimal set of correspondences (4 for a homography), (2) compute H from just those, (3) count how many other correspondences agree with this H (inliers), (4) repeat many times, (5) keep the H with the most inliers.

The RANSAC loop:
1. Sample 4 random correspondences
2. Compute H from these 4 points (exact, using DLT)
3. For every other correspondence, check if d(x', Hx) < threshold
4. Count inliers. If more than best so far, save this H
5. Repeat for N iterations
6. Re-estimate H using all inliers of the best model

RANSAC in Action

Inliers follow the true model. Outliers are random. RANSAC finds the model despite the outliers. Click Run to watch iterations.

Click Run

What is the minimum number of correspondences RANSAC samples per iteration for a homography?

4 7 8

Chapter 7: Robust Estimation

RANSAC gives a hard inlier/outlier classification. A more nuanced approach uses robust cost functions that down-weight outliers smoothly rather than rejecting them entirely.

The Huber cost function is quadratic for small errors (like least-squares) but linear for large errors (reducing outlier influence). The Tukey biweight goes further, giving zero weight to errors beyond a threshold.

RANSAC + robust refinement: The best practical approach combines RANSAC (to get a rough inlier set and initial model) with robust M-estimation (to refine the model while smoothly handling borderline correspondences).

The number of RANSAC iterations needed depends on the outlier ratio ε and the desired confidence p: N = log(1 − p) / log(1 − (1 − ε)^s), where s is the sample size. With 50% outliers and 99% confidence, you need about 72 iterations for a homography (s = 4).

Why might you prefer a robust cost function over hard RANSAC inlier/outlier classification?

It smoothly down-weights outliers instead of discarding borderline points entirely It is faster than RANSAC It doesn't require any threshold

Chapter 8: Automatic Homography Computation

Putting it all together: a fully automatic system for computing a homography between two images of a planar scene.

1. Feature Detection

Find interest points (Harris corners, SIFT, etc.)

↓

2. Feature Matching

Match features by descriptor similarity. Many matches will be wrong.

↓

3. RANSAC

Robustly estimate H, classify inliers/outliers

↓

4. Refinement

Re-estimate H from all inliers using MLE (Gold Standard)

↓

5. Guided Matching

Use estimated H to find more correspondences, re-estimate

Guided matching: After an initial H estimate, search for correspondences only along the predicted locations (within a small radius). This finds matches that were missed initially and further improves the estimate.

What is the purpose of guided matching after initial RANSAC estimation?

To find additional correct correspondences using the estimated homography as a guide To remove all outliers To compute a new SVD

Chapter 9: Connections

This chapter established the estimation toolkit that is used throughout the rest of the book. The same patterns — DLT initialization, geometric error minimization, RANSAC for robustness — apply to every estimation problem:

Problem	Matrix	DOF	Min Correspondences
2D Homography	H (3×3)	8	4 points
Camera matrix	P (3×4)	11	6 points (3D↔2D)
Fundamental matrix	F (3×3, rank 2)	7	7 points
Trifocal tensor	T (3×3×3)	18	7 points in 3 views

"In almost every case, the normalized version of any algorithm gives superior results to the un-normalized version."

— Hartley & Zisserman, Chapter 4

What is the standard estimation pipeline used throughout the book?

DLT (linear init) → Minimize geometric error (LM) → RANSAC for robustness Gradient descent from random initialization Exhaustive search over all possible matrices

← Chapter 3 Chapter 5: Evaluation →

Estimation —2D Projective Transformations