The 8-point algorithm, normalization, algebraic and geometric minimization, RANSAC for robust estimation, degeneracies, and image rectification.
The fundamental matrix F encodes the epipolar geometry of two views. Computing it reliably from image correspondences is the first step of any uncalibrated reconstruction pipeline. Get F right, and everything downstream (triangulation, reconstruction, calibration) benefits. Get it wrong, and nothing works.
The relationship is x'TFx = 0 for corresponding points. Each correspondence gives one linear equation in the 9 entries of F. With enough correspondences, F can be determined.
Expanding x'TFx = 0 with x = (x, y, 1)T and x' = (x', y', 1)T:
This is one linear equation in the 9 unknowns fij. Stacking n correspondences gives an n × 9 matrix A and we solve Af = 0.
| Case | Null space dim | # solutions for F |
|---|---|---|
| n ≥ 8 (general position) | 1 | Unique (up to scale) |
| n = 7 | 2 | 1 or 3 solutions (imposing det F = 0) |
The un-normalized 8-point algorithm performs terribly in practice. The reason: image coordinates like (500, 300, 1) create matrix entries differing by orders of magnitude. The SVD produces numerically unstable results.
Hartley's normalized 8-point algorithm is dramatically better:
| Step | Action |
|---|---|
| 1 | Translate and scale points in each image so centroid is at origin and RMS distance is √2 |
| 2 | Run the 8-point algorithm on normalized points |
| 3 | Denormalize: F = T'T F̃ T |
The matrix F must satisfy det F = 0 (it has rank 2). The SVD solution of Af = 0 does not guarantee this. We must enforce the rank-2 constraint as a post-processing step.
The method: compute the SVD of the estimated F = UDVT, where D = diag(σ1, σ2, σ3). Set σ3 = 0 to get D' = diag(σ1, σ2, 0). The corrected F' = UD'VT is the closest rank-2 matrix to F in Frobenius norm.
For the 7-point algorithm: the null space of A is 2-dimensional, giving F = αF1 + (1−α)F2. The constraint det(αF1 + (1−α)F2) = 0 is a cubic in α, giving 1 or 3 real solutions.
As with camera estimation, algebraic error does not correspond to anything meaningful geometrically. The Gold Standard approach minimizes the reprojection error:
subject to x̂'iT F x̂i = 0. This minimizes the sum of squared distances between measured and "corrected" points, where the corrected points exactly satisfy the epipolar constraint.
In practice, point correspondences contain outliers (wrong matches). RANSAC (Random Sample Consensus) handles this by repeatedly sampling minimal subsets.
| Step | Action |
|---|---|
| 1 | Randomly select 7 or 8 correspondences |
| 2 | Compute F from this minimal set |
| 3 | Count inliers: correspondences with Sampson error below threshold |
| 4 | Repeat; keep the F with the most inliers |
| 5 | Re-estimate F from all inliers (Gold Standard) |
| 6 | Guided matching: use the estimated F to find additional correspondences |
Some point configurations cannot uniquely determine F, even with unlimited noise-free data.
| Configuration | Null space dim | Solutions |
|---|---|---|
| General position, n ≥ 8 | 1 | Unique F |
| Points + cameras on ruled quadric | 2 | 3 solutions (including 7-point case) |
| All points on a plane (or camera rotation only) | 3 | 2-parameter family: F = [t]×H |
Pure rotation (no translation) is also degenerate: the camera centres coincide, so F is undefined (the zero matrix). Points are related by a homography H = K'RK−1.
Special camera motions simplify the computation of F:
| Motion | F form | DOF | Min points |
|---|---|---|---|
| Pure translation | F = [e]× (skew-symmetric) | 2 | 2 |
| Planar motion | det Fs = 0 | 6 | 6 |
| Calibrated (essential E) | σ1 = σ2, σ3 = 0 | 5 | 5 |
Other entities besides points can constrain F. Vanishing points (images of parallel lines) provide one constraint each. Epipolar tangency of curves and surface outlines also constrains F. But general line correspondences provide no constraint on F (two planes always intersect).
Image rectification transforms two images so that corresponding epipolar lines become horizontal scan lines (y = y'). This simplifies stereo matching to a 1D search along each row.
Rectification works by applying a projective transformation H to the first image and H' to the second, such that:
| Property | After rectification |
|---|---|
| Epipoles | Both at infinity: e = e' = (1, 0, 0)T |
| Epipolar lines | Horizontal: y = y' (corresponding rasters) |
| F | F = [e']× = standard skew-symmetric form |
| Link | Connection |
|---|---|
| Ch 4 → Ch 11 | DLT, normalization, and RANSAC all extend from homography estimation to F estimation |
| Ch 11 → Ch 12 | Once F is computed, triangulation recovers 3D structure |
| Ch 11 → Ch 13 | F and plane-induced homographies are related: F can be computed from H + one extra correspondence |
| Ch 11 → Ch 18 | Bundle adjustment refines F (and all cameras) over many views |