Hartley & Zisserman, Chapter 15

The Trifocal Tensor

The three-view analogue of the fundamental matrix. Tensor notation, point-line incidence, point transfer, and the relationship to fundamental matrices and camera matrices.

Prerequisites: Chapter 9 (Epipolar Geometry) + Chapter 11 (Computing F).
10
Chapters
4+
Simulations

Chapter 0: Why Three Views?

The fundamental matrix captures the geometry of two views. But many reconstruction problems involve three or more views. Is there a geometric object that encodes the relationship between three views, just as F encodes two views?

Yes: the trifocal tensor T. It is a 3 × 3 × 3 array (27 entries) that encodes all the geometric relationships between three views. A point in one image and lines in the other two satisfy a trilinear relation through T.

What T gives you over F: A point correspondence across three views provides 4 independent constraints on T (vs. 1 constraint on F per two-view pair). This makes T estimation more constrained and more robust. T also enables point transfer: given a point in two images, predict its location in the third.
How many independent constraints does a three-view point correspondence provide on the trifocal tensor?

Chapter 1: The Geometric Basis

Consider three cameras with centres C, C', C'' and a 3D point X. The point X, together with the three camera centres, defines three epipolar planes (one for each pair of cameras). The trifocal tensor arises from the constraint that these three planes share the point X.

More concretely: a line l' in the second image back-projects to a plane π'. A line l'' in the third image back-projects to a plane π''. These two planes intersect in a 3D line L. The image of L in the first camera is a line l. The trifocal tensor computes l from l' and l'':

lp = l'q l''r Tqrp
The trifocal tensor maps pairs of lines to a line. Two lines in two different images determine a line in the third. This is the fundamental geometric operation, from which all other relations (point-point-point, point-point-line, etc.) are derived.
What does the trifocal tensor compute from two lines in two images?

Chapter 2: Defining the Tensor T

For three camera matrices A, B, C (each 3×4), the trifocal tensor entry Tqri is:

Tqri = (−1)i+1 det [al, am, bq, cr]

where al, am are the two rows of A obtained by deleting row i (for l < m), and bq, cr are individual rows of B and C.

DOF: The trifocal tensor has 27 entries, defined up to scale, so 26 DOF as a homogeneous object. But it arises from 3 camera matrices (33 DOF) minus a 15-DOF projective ambiguity = 18 DOF. So T must satisfy 26 − 18 = 8 internal constraints.
Correspondence type# independent equations
Three points4
Two points, one line2
One point, two lines1
Three lines2
How many degrees of freedom does the trifocal tensor have?

Chapter 3: Point Transfer

Given a point x in image 1 and its correspondence x' in image 2, the trifocal tensor predicts the point x'' in image 3:

x''k = xi x'j εjqu εkrv Tqri

This is point transfer: no triangulation needed. The tensor directly maps (x, x') to x''.

Why is this useful? In view synthesis, you want to predict what a third camera would see. In correspondence search, if you have matches in two views, T predicts the location in the third — narrowing the search from a line (epipolar) to a point.

The transfer is exact: if x, x' correspond to the same 3D point, the predicted x'' is the exact projection. This is more powerful than the epipolar constraint (which only constrains x'' to a line).

Point transfer using the trifocal tensor predicts what from correspondences in two views?

Chapter 4: Line Incidence

The most fundamental relation involving T is the point-line-line incidence: if a point x in image 1 corresponds to points on lines l' and l'' in images 2 and 3:

xi l'q l''r Tqri = 0

All other relations (three points, two points + line, three lines) are derivable from this basic one by substituting points for lines.

Line correspondences across three views DO constrain T (unlike the two-view case where line correspondences give no constraint on F). This is because three planes in 3-space do not generically intersect in a line — the coincidence constraint is meaningful.
Unlike in two views, do line correspondences across three views constrain the multi-view geometry?

Chapter 5: Tensor Notation

The trifocal tensor uses index notation from tensor algebra. Key conventions:

SymbolTypeMeaning
xiContravariantPoint (column vector)
liCovariantLine (row vector)
TqriMixedTensor: one covariant, two contravariant indices
εijkLevi-CivitaAlternating tensor (cross product)
Why tensor notation? It makes the symmetries and transformations explicit. When camera 1 contributes two rows (index i with omission) and cameras 2 and 3 each contribute one row (indices q and r), the asymmetry in Tqri reflects this. There are actually three different trifocal tensors, depending on which camera is "distinguished."
How many distinct trifocal tensors can be formed from three camera matrices?

Chapter 6: Fundamental Matrices from T

The trifocal tensor contains the fundamental matrices for all three pairs of views. They can be extracted as:

PairExtraction
F21 (views 1,2)[e'']× [T1, T2, T3] e''
F31 (views 1,3)[e'']× [T1T, T2T, T3T] e'
F32 (views 2,3)Derived from F21 and F31

where Ti are the 3×3 "slices" of the tensor, and e', e'' are the epipoles.

T is more informative than three F's. The trifocal tensor encodes 18 DOF. Three pairwise fundamental matrices encode 3 × 7 = 21 DOF, but with 3 consistency constraints, giving 18 independent DOF — exactly matching T. So T and the three F's carry the same information, but T enforces consistency automatically.
Can the fundamental matrices for all camera pairs be extracted from the trifocal tensor?

Chapter 7: Computing T

The trifocal tensor can be computed from point and/or line correspondences across three views using methods analogous to the 8-point algorithm for F.

MethodMin correspondencesNotes
Linear (normalized)7 points (or 13 lines)Analogous to 8-point algorithm; normalize first
Algebraic minimization7+Enforce internal constraints
Geometric (Gold Standard)7+Minimize reprojection error via LM
RANSAC7 samplesFor outlier-contaminated data
Normalization and constraint enforcement matter just as much for T as for F. The linear algorithm estimates 26 DOF, but T has only 18 — so 8 internal constraints must be enforced. Ignoring them degrades accuracy significantly.
What is the minimum number of point correspondences across three views needed to compute the trifocal tensor?

Chapter 8: Properties and Constraints

PropertyDetail
Size3 × 3 × 3 = 27 entries
DOF18 (= 3×11 − 15)
Internal constraints8 (algebraic constraints on the entries)
Camera recoveryCamera matrices can be recovered from T up to projective ambiguity
UniquenessT is unique for a given set of three cameras (up to the projective ambiguity)
Affine trifocal tensor: If all three cameras are affine (last row (0,0,0,1)), then 11 of the 27 entries of T are zero. The affine tensor has only 16 non-zero entries (15 DOF up to scale).
How many internal algebraic constraints must the trifocal tensor satisfy?

Chapter 9: Connections

DirectionConnection
F → TT generalizes F to three views; it encodes all three pairwise F's
T → QThe quadrifocal tensor (4 views, 3×3×3×3) further generalizes T; it has 29 DOF
T → Ch 18T provides initialization for N-view bundle adjustment
T → Ch 19T can be used for auto-calibration from three views
"The trifocal tensor may be thought of as a book of homographies, one for each epipolar plane."
— Hartley & Zisserman, Chapter 15
What is the four-view generalization of the trifocal tensor?
← Chapter 13 Chapter 18: N-View Methods →