The pixel-level toolkit: point operators, convolution, Fourier transforms, image pyramids, and geometric warps.
You have an image — a grid of pixel values produced by the formation pipeline from Chapter 2. Now what? Before any high-level understanding can happen (recognition, 3D reconstruction, tracking), the raw pixel data usually needs to be cleaned up, enhanced, or transformed.
Image processing is the set of operations that take an image as input and produce a modified image as output. Brighten a dark photo? That is a point operator. Blur out noise? That is a linear filter. Sharpen edges? That is another filter. Resize or rotate the image? That is a geometric warp.
These operations are the building blocks for everything that follows in computer vision. Edge detection, feature matching, image stitching, super-resolution — all of them rely on the tools in this chapter.
Click each operation to see its effect on a synthetic image.
The simplest image processing operations work on individual pixels independently. A point operator transforms each pixel value without looking at its neighbors:
where f is the input image, g is the output, and h is some function applied to each pixel.
Common point operators include:
Adjust brightness, contrast, and gamma to see their effect on a gradient.
A histogram counts how many pixels have each intensity value. It tells you at a glance whether an image is dark (values clustered at the low end), bright (clustered high), or low-contrast (narrow spread).
Histogram equalization redistributes pixel values so they span the full range uniformly. The algorithm is simple: compute the cumulative distribution function (CDF) of the histogram, then use it as a lookup table to remap each pixel.
where L is the number of intensity levels (256 for 8-bit images). This stretches dark images and compresses overexposed ones, automatically improving contrast.
The left shows the original dark image and its histogram. Click Equalize to redistribute values.
Unlike point operators, a linear filter computes each output pixel from a weighted combination of its neighbors. The weights are stored in a small grid called a kernel (or filter mask), and the operation of sliding this kernel across the image is called convolution.
Key kernels and what they do:
| Kernel | Effect | Size |
|---|---|---|
| Box filter | Averages all neighbors equally → blur | Any |
| Gaussian | Weighted average, more weight to center → smooth blur | Depends on σ |
| Sobel | Approximates derivative → edge detection | 3×3 |
| Laplacian | Second derivative → edge + blob detection | 3×3 |
| Sharpen | Enhances high frequencies → crisper edges | 3×3 |
Select a kernel type and see the convolution result. The kernel weights are shown on the right.
Linear filters are powerful, but they have a fundamental limitation: they blur edges along with noise. A nonlinear filter can preserve edges while still smoothing noise.
The most important nonlinear filters:
where Gσs is the spatial Gaussian (nearby pixels matter more) and Gσr is the range Gaussian (similar-valued pixels matter more). W(x) is a normalizing factor.
Compare Gaussian blur (which blurs edges) with bilateral filtering (which preserves them).
Every image can be decomposed into a sum of sinusoidal patterns at different frequencies and orientations. The Fourier transform converts an image from the spatial domain (pixel values at locations) to the frequency domain (amplitudes and phases of sinusoids).
In the frequency domain, low frequencies correspond to smooth, slowly varying regions. High frequencies correspond to edges, textures, and noise. This decomposition is powerful because convolution in the spatial domain equals multiplication in the frequency domain.
Practical applications:
Drag the cutoff frequency to see low-pass and high-pass filtering effects on a 1D signal.
Many vision tasks need to operate at multiple scales. A face might be 20 pixels wide in one image and 200 pixels in another. Image pyramids provide a multi-resolution representation that handles this elegantly.
The Gaussian pyramid is built by repeatedly blurring and downsampling the image by a factor of 2. Each level is half the resolution of the one below.
The Laplacian pyramid stores the difference between consecutive Gaussian levels. Each level captures details at a specific scale. This is the basis of multi-resolution blending — you can seamlessly composite images by blending at each pyramid level independently.
Watch the image shrink through pyramid levels. Each level blurs and halves the resolution.
Sometimes you need to change the geometry of an image rather than its pixel values. Geometric transformations (from Chapter 2) move pixels to new locations: translation, rotation, scaling, affine, and projective warps.
The key question is: when a pixel lands between grid positions, how do you compute its value? This is the interpolation problem.
| Method | Quality | Speed |
|---|---|---|
| Nearest neighbor | Blocky, aliased | Fastest |
| Bilinear | Smooth, slight blurring | Fast |
| Bicubic | Sharp, minimal artifacts | Moderate |
In practice, you use inverse warping: for each output pixel, compute where it came from in the input, then interpolate. This avoids holes in the output.
Zoom into a small image patch using different interpolation methods.
Let's chain operations together into a full processing pipeline. Start with a noisy, low-contrast image and apply a sequence of operations to clean it up.
Toggle each processing step to see its cumulative effect on the image.
Image processing tools underpin nearly every technique in the rest of this book. Here is the map:
| Concept | Used In |
|---|---|
| Convolution / filtering | Ch 5 (CNNs), Ch 7 (Feature detection), Ch 9 (Optical flow) |
| Gaussian blur | Ch 7 (Scale space for SIFT), Ch 8 (Image stitching blending) |
| Image pyramids | Ch 8 (Multi-scale stitching), Ch 9 (Coarse-to-fine flow), Ch 10 (Super-resolution) |
| Fourier transforms | Ch 10 (Deconvolution), Ch 12 (Frequency-based stereo) |
| Geometric warps | Ch 8 (Image alignment), Ch 9 (Motion compensation), Ch 14 (View synthesis) |
| Histogram methods | Ch 10 (HDR tone mapping), Ch 6 (Feature histograms like HOG) |
| Bilateral filtering | Ch 10 (HDR), Ch 12 (Stereo cost filtering) |