Hartley & Zisserman, Chapter 11

Computation of the Fundamental Matrix F

The 8-point algorithm, normalization, algebraic and geometric minimization, RANSAC for robust estimation, degeneracies, and image rectification.

Prerequisites: Chapter 4 (Estimation) + Chapter 9 (Epipolar Geometry).
10
Chapters
5+
Simulations

Chapter 0: Why Compute F?

The fundamental matrix F encodes the epipolar geometry of two views. Computing it reliably from image correspondences is the first step of any uncalibrated reconstruction pipeline. Get F right, and everything downstream (triangulation, reconstruction, calibration) benefits. Get it wrong, and nothing works.

The challenge: Real correspondences contain outliers (wrong matches). Measurements are noisy. Some scene configurations are degenerate. A robust F estimation algorithm must handle all three issues simultaneously.

The relationship is x'TFx = 0 for corresponding points. Each correspondence gives one linear equation in the 9 entries of F. With enough correspondences, F can be determined.

How many equations does each point correspondence contribute to the linear system for F?

Chapter 1: The 8-Point Algorithm

Expanding x'TFx = 0 with x = (x, y, 1)T and x' = (x', y', 1)T:

x'x f11 + x'y f12 + x' f13 + y'x f21 + y'y f22 + y' f23 + x f31 + y f32 + f33 = 0

This is one linear equation in the 9 unknowns fij. Stacking n correspondences gives an n × 9 matrix A and we solve Af = 0.

CaseNull space dim# solutions for F
n ≥ 8 (general position)1Unique (up to scale)
n = 721 or 3 solutions (imposing det F = 0)
Why "8-point"? F has 9 entries but is defined up to scale, so 8 DOF as a homogeneous matrix. However, det F = 0 imposes an additional constraint, leaving 7 true DOF. Eight correspondences give 8 equations, determining a unique solution in the 8-DOF space. The det = 0 constraint is enforced afterwards.
The 8-point algorithm solves for F using how many linear equations?

Chapter 2: Normalization

The un-normalized 8-point algorithm performs terribly in practice. The reason: image coordinates like (500, 300, 1) create matrix entries differing by orders of magnitude. The SVD produces numerically unstable results.

Hartley's normalized 8-point algorithm is dramatically better:

StepAction
1Translate and scale points in each image so centroid is at origin and RMS distance is √2
2Run the 8-point algorithm on normalized points
3Denormalize: F = T'T F̃ T
This normalization step transforms the 8-point algorithm from a curiosity to a practical workhorse. Without it, results can be off by orders of magnitude. With it, the 8-point algorithm often approaches the accuracy of iterative methods.
What does normalization do to the image coordinates before computing F?

Chapter 3: The Singularity Constraint

The matrix F must satisfy det F = 0 (it has rank 2). The SVD solution of Af = 0 does not guarantee this. We must enforce the rank-2 constraint as a post-processing step.

The method: compute the SVD of the estimated F = UDVT, where D = diag(σ1, σ2, σ3). Set σ3 = 0 to get D' = diag(σ1, σ2, 0). The corrected F' = UD'VT is the closest rank-2 matrix to F in Frobenius norm.

Why rank 2 matters: If F had rank 3, the epipolar lines would not pass through a common epipole. The geometry would be inconsistent. Enforcing rank 2 ensures that all epipolar lines in each image pass through a single point (the epipole).

For the 7-point algorithm: the null space of A is 2-dimensional, giving F = αF1 + (1−α)F2. The constraint det(αF1 + (1−α)F2) = 0 is a cubic in α, giving 1 or 3 real solutions.

How is the rank-2 constraint on F enforced?

Chapter 4: Geometric Error and Gold Standard

As with camera estimation, algebraic error does not correspond to anything meaningful geometrically. The Gold Standard approach minimizes the reprojection error:

minF, x̂i, x̂'i Σi [d(xi, x̂i)2 + d(x'i, x̂'i)2]

subject to x̂'iT F x̂i = 0. This minimizes the sum of squared distances between measured and "corrected" points, where the corrected points exactly satisfy the epipolar constraint.

Sampson error: A first-order approximation to geometric error that avoids the full iterative minimization. It is much faster than the Gold Standard and often nearly as accurate. It is computed as:
dSampson2 = (x'TFx)2 / [(Fx)12 + (Fx)22 + (FTx')12 + (FTx')22]
What does the Sampson error approximate?

Chapter 5: RANSAC for Robust F

In practice, point correspondences contain outliers (wrong matches). RANSAC (Random Sample Consensus) handles this by repeatedly sampling minimal subsets.

StepAction
1Randomly select 7 or 8 correspondences
2Compute F from this minimal set
3Count inliers: correspondences with Sampson error below threshold
4Repeat; keep the F with the most inliers
5Re-estimate F from all inliers (Gold Standard)
6Guided matching: use the estimated F to find additional correspondences
Guided matching: Once F is estimated, it constrains where matches can appear. For each unmatched feature in image 1, search only along its epipolar line in image 2. This typically doubles the number of correspondences, further improving the final F estimate.
What is the purpose of guided matching after RANSAC?

Chapter 6: Degeneracies

Some point configurations cannot uniquely determine F, even with unlimited noise-free data.

ConfigurationNull space dimSolutions
General position, n ≥ 81Unique F
Points + cameras on ruled quadric23 solutions (including 7-point case)
All points on a plane (or camera rotation only)32-parameter family: F = [t]×H
The planar degeneracy: If all scene points lie on a plane, the correspondences are related by a homography x' = Hx, and F has a 2-parameter family of solutions F = SH for any skew-symmetric S. This is the most common degeneracy in practice (e.g., a wall, a road surface). RANSAC with a homography model can detect this case.

Pure rotation (no translation) is also degenerate: the camera centres coincide, so F is undefined (the zero matrix). Points are related by a homography H = K'RK−1.

When all scene points lie on a plane, what happens to the F estimation?

Chapter 7: Special Cases

Special camera motions simplify the computation of F:

MotionF formDOFMin points
Pure translationF = [e]× (skew-symmetric)22
Planar motiondet Fs = 066
Calibrated (essential E)σ1 = σ2, σ3 = 055
The calibrated case: With calibrated cameras, compute the essential matrix E instead of F. The 8-point algorithm works, but the constraint that E's two non-zero singular values are equal must be enforced. The closest essential matrix to an arbitrary E has singular values ((a+b)/2, (a+b)/2, 0) where a ≥ b ≥ c are the original singular values.

Other entities besides points can constrain F. Vanishing points (images of parallel lines) provide one constraint each. Epipolar tangency of curves and surface outlines also constrains F. But general line correspondences provide no constraint on F (two planes always intersect).

Do line correspondences between two images constrain the fundamental matrix F?

Chapter 8: Image Rectification

Image rectification transforms two images so that corresponding epipolar lines become horizontal scan lines (y = y'). This simplifies stereo matching to a 1D search along each row.

Rectification works by applying a projective transformation H to the first image and H' to the second, such that:

PropertyAfter rectification
EpipolesBoth at infinity: e = e' = (1, 0, 0)T
Epipolar linesHorizontal: y = y' (corresponding rasters)
FF = [e']× = standard skew-symmetric form
Why rectify? Dense stereo matching (computing a disparity map for every pixel) requires searching for correspondences. Rectification reduces this from a 2D search to a 1D search along each row, making stereo matching orders of magnitude faster. All practical stereo systems rectify their images first.
After image rectification, where are the epipoles?

Chapter 9: Connections

LinkConnection
Ch 4 → Ch 11DLT, normalization, and RANSAC all extend from homography estimation to F estimation
Ch 11 → Ch 12Once F is computed, triangulation recovers 3D structure
Ch 11 → Ch 13F and plane-induced homographies are related: F can be computed from H + one extra correspondence
Ch 11 → Ch 18Bundle adjustment refines F (and all cameras) over many views
"The normalized 8-point algorithm: a rehabilitation of an old idea that works."
— Hartley & Zisserman, Chapter 11
What is the single most important practical improvement to the 8-point algorithm?
← Chapter 10 Chapter 12: Triangulation →