The geometry of two views. Epipoles, epipolar lines, the fundamental matrix F, the essential matrix E, and how they constrain point correspondences between two images.
Suppose you see a point x in one image. Where can its correspondence x' appear in a second image? Without any knowledge of the 3D point, x' could be anywhere in the second image — or could it?
No. The point x back-projects to a ray through the first camera centre. The second camera sees this ray as a line in its image. The correspondence x' must lie on this line. This line is the epipolar line of x.
A point in the left image constrains its match to an epipolar line in the right image. All epipolar lines pass through the epipole.
The geometry is defined by the two camera centres C and C', and a 3D point X. These three points define a plane called the epipolar plane.
| Entity | Definition |
|---|---|
| Epipolar plane | The plane through C, C', and X |
| Baseline | The line joining C and C' |
| Epipole e | Image of C' in the first camera (where the baseline pierces the first image plane) |
| Epipole e' | Image of C in the second camera |
| Epipolar line l | Intersection of epipolar plane with first image plane |
| Epipolar line l' | Intersection of epipolar plane with second image plane |
The correspondence between epipolar lines l ↔ l' is a 1D projective transformation (a homography of the pencil) with 3 degrees of freedom.
The fundamental matrix F is a 3×3 matrix of rank 2 that algebraically encodes the epipolar geometry. For any pair of corresponding points x ↔ x':
This is the correspondence condition. It says that x' lies on the epipolar line l' = Fx, and equivalently that x lies on l = FTx'.
F maps points to lines: x → l' = Fx. This makes it a correlation (not a full projective transformation), which is why it has rank 2, not 3.
| Property | Formula |
|---|---|
| Correspondence condition | x'T F x = 0 |
| Epipolar line from x | l' = Fx |
| Epipolar line from x' | l = FTx' |
| Epipole e' (left null-vector) | e'TF = 0 |
| Epipole e (right null-vector) | Fe = 0 |
| Transpose symmetry | F for (P, P') ⇒ FT for (P', P) |
| Degrees of freedom | 7 (9 entries − 1 scale − 1 for det F = 0) |
F is not invertible (rank 2), so it defines a mapping from points to lines but not from points to points. It is not a projective transformation.
Given two camera matrices P and P', the fundamental matrix is:
where P+ is the pseudo-inverse of P (PP+ = I) and e' = P'C is the epipole (the image of the first camera centre C in the second camera).
For a calibrated stereo rig P = K[I | 0] and P' = K'[R | t]:
Note: F is defined only when the camera centres are distinct. If C = C', then e' = P'C = 0 and F = 0 (the zero matrix).
Given only F (computed from image correspondences), we can recover a pair of camera matrices up to a projective ambiguity. Set P = [I | 0], then:
for arbitrary vector v and scalar λ. In the simplest form (v = 0, λ = 1): P' = [[e']× F | e'].
Any matrix F of rank 2 is the fundamental matrix of some pair of cameras. This means F defines a valid two-view geometry, and camera matrices can always be extracted.
The essential matrix E is the specialization of F to calibrated cameras. If normalized image coordinates x̂ = K−1x are used:
E has the form E = [t]× R, where R is the rotation and t is the translation between cameras. It has only 5 DOF (3 for R + 2 for the direction of t, since overall scale is unrecoverable).
From E, the rotation R and translation direction t can be extracted (up to a 4-fold ambiguity). The chirality constraint (all points must have positive depth in both cameras) selects the unique correct solution.
When the camera undergoes a pure translation (no rotation, no change in K), the fundamental matrix takes a special form: F = [e]×, which is skew-symmetric and has only 2 DOF.
In this case e = e' = v, where v is the vanishing point of the translation direction (the Focus of Expansion). All epipolar lines radiate from v. Points appear to "flow" outward from v, and closer points move faster than distant ones.
If the translation is along the x-axis, e = (1, 0, 0)T and the epipolar lines are horizontal scan lines: y' = y. This is the ideal configuration for stereo matching.
Planar motion occurs when the rotation axis is perpendicular to the translation direction (e.g., a car moving forward while turning). This imposes an extra constraint: the symmetric part of F has rank 2 (det Fs = 0).
The fundamental matrix for planar motion can be parameterized as F = [e']× [ls]× [e]×, which automatically satisfies both the rank-2 constraint on F and the rank-2 constraint on Fs.
General camera motion can be decomposed as: first apply a correction H = K'RK−1 (the infinite homography) to undo rotation and calibration differences, then the residual is a pure translation. So F = [e']× H∞.
The fundamental matrix is the most important multi-view geometric object in this book. Everything in Part II flows from it.
| Chapter | How F connects |
|---|---|
| Ch 10: Reconstruction | F determines cameras (up to projective ambiguity) ⇒ enables 3D reconstruction |
| Ch 11: Computing F | Practical algorithms: 8-point, normalized, RANSAC |
| Ch 12: Triangulation | F provides the epipolar constraint; triangulation finds 3D points |
| Ch 13: Homographies | F = [e']× H relates homographies and the fundamental matrix |
| Ch 15: Trifocal Tensor | Three-view generalization: the trifocal tensor encodes what F does for two views |