From matching pairs of images to building seamless panoramas: alignment, RANSAC, blending, and compositing.
Your phone's panorama mode works like magic: slowly sweep the camera, and out comes a wide, seamless image. But behind that simplicity lies a chain of algorithms: feature detection, geometric alignment, global optimization, and pixel blending.
Image stitching solves the problem of combining multiple overlapping images into a single, larger image. The pipeline:
The five stages from raw images to a seamless panorama.
Given feature matches between two images, how do you find the transformation that aligns them? Start with the simplest case: a translation (shift in x and y).
With matched point pairs (xi, xi'), set up a least-squares system:
The solution is just the average displacement: t = mean(xi' − xi). For more complex transformations, the math changes but the principle remains: find the parameters that minimize the sum of squared reprojection errors.
| Model | DOF | Min. Points | What It Allows |
|---|---|---|---|
| Translation | 2 | 1 | Shift only |
| Similarity | 4 | 2 | Shift + rotate + uniform scale |
| Affine | 6 | 3 | Shift + rotate + scale + shear |
| Homography | 8 | 4 | Full projective (handles perspective) |
Feature matching always produces some wrong matches (outliers). Even a few outliers can completely destroy a least-squares fit. RANSAC (Random Sample Consensus) is the standard solution.
The algorithm:
Watch RANSAC find the line despite outliers. Least squares (gray) is pulled by outliers. RANSAC (green) ignores them.
The choice of motion model depends on the scene and camera motion:
A homography maps point (x, y) to (x', y') via:
The division by the last row is what gives projective transformations their power: parallel lines can converge, rectangles can become trapezoids.
See how different motion models transform a square grid.
When the camera rotates without translating, the relationship between any two images is a homography. But what projection surface should the panorama use?
| Projection | Properties | Best For |
|---|---|---|
| Planar | Straight lines stay straight. Severe distortion at edges. | Narrow fields of view (<90°) |
| Cylindrical | Horizontal lines stay straight. Wraps around. | Horizontal panoramas (120-360°) |
| Spherical | Full omnidirectional coverage. Area distortion at poles. | Full 360° × 180° spheres |
But cylindrical projection introduces a subtlety: you need to know the camera's focal length to correctly project onto the cylinder. If the focal length is wrong, straight lines become curved. This is why phone panorama modes often use the gyroscope to estimate rotation rather than relying on feature-based alignment alone.
Pairwise alignment estimates a transform between each pair. But errors accumulate: if image 1 aligns to image 2, and image 2 aligns to image 3, the alignment from 1 to 3 inherits both errors. Over many images, this creates visible drift.
Global alignment estimates all transformations simultaneously, minimizing the total reprojection error across all image pairs:
Recognizing panoramas (Brown and Lowe, 2007) automated the entire pipeline: given an unordered set of photos, the system automatically identifies which images overlap, groups them into panoramas, estimates all transformations globally, and produces stitched results. This is the technology behind every phone panorama app.
Global alignment for panoramas is a special case of bundle adjustment: the simultaneous refinement of camera parameters and 3D point positions to minimize total reprojection error.
For panoramas, "camera parameters" are just the rotation (and optionally focal length). For full 3D reconstruction (Chapter 11), bundle adjustment also optimizes camera positions and 3D point locations.
Modern bundle adjustment implementations (Ceres Solver, g2o) exploit this sparsity structure. They decompose the normal equations using the Schur complement, reducing the problem size from (cameras + points) to just cameras, since there are far fewer cameras than points.
Alignment is only half the battle. When images overlap, you need to combine their pixels without visible seams. Differences in exposure, white balance, and vignetting create mismatches at boundaries.
Blending strategies:
| Method | How It Works | Quality |
|---|---|---|
| Feathering | Linear weight falloff from center. Average in overlap. | Simple but shows ghosting with motion. |
| Laplacian pyramid | Blend at each frequency band separately. Low frequencies: slow transition. High frequencies: sharp seam. | Excellent. The standard approach. |
| Optimal seam | Find the cut through the overlap where images match best (graph cut / dynamic programming). | Best for parallax and moving objects. |
| Gradient-domain | Blend gradients, then reconstruct via Poisson equation. Seamless by construction. | Eliminates intensity differences. |
Compare naive stitching (hard cut) with smooth blending. Notice the seam difference.
Let's visualize the stitching pipeline end to end. Three overlapping images are aligned via homographies and blended into a seamless panorama.
Watch three images align and blend into a single panorama.
Image alignment and stitching connects to many other topics:
| Concept | Used In |
|---|---|
| RANSAC | Ch 7 (matching), Ch 11 (pose estimation), Ch 9 (motion), virtually everywhere |
| Homography estimation | Ch 2 (projective geometry), Ch 11 (planar SfM), AR overlays |
| Bundle adjustment | Ch 11 (SfM), Ch 12 (multi-view stereo), visual SLAM |
| Laplacian blending | Ch 10 (HDR, compositing), Ch 3 (pyramids), image editing |
| Panorama recognition | Ch 6 (image retrieval), photo organization |
| Global alignment | Ch 11 (global SfM), map building |