How to fit models to noisy data: interpolation, regularization, variational methods, and Markov random fields.
You have data — a set of measured points, pixel values, or correspondences — and you want to find a model that explains them. Maybe you want to fit a line to noisy edge points, a smooth surface to sparse depth measurements, or a segmentation map to an image. This is the model fitting problem.
The challenge is that real data is noisy and incomplete. Measured points deviate from the true model. Some measurements are outright wrong (outliers). The model itself might not perfectly describe reality. You need principled ways to handle all of this.
This chapter provides the mathematical machinery: least squares, robust estimation, regularization, and energy minimization. These tools reappear in every chapter that follows.
Noisy points scattered around a line. Click to see different fitting approaches.
The most fundamental fitting technique minimizes the sum of squared differences between the model's predictions and the observed data:
where p are the model parameters, xi are inputs, yi are observed values, and f is the model function. For a linear model (y = ax + b), the solution is closed-form:
This is the normal equation. It gives the parameters that minimize squared error in one shot. For nonlinear models, we use iterative methods like Gauss-Newton or Levenberg-Marquardt.
Click "Add Outlier" to see how a single bad point can dramatically change the fit.
Often you have sparse measurements at irregular positions and need to fill in the gaps. This is interpolation: given values at known locations, estimate the value at arbitrary points.
Radial basis functions (RBFs) are a powerful approach. The interpolant is a weighted sum of basis functions centered at the data points:
Common basis functions φ:
Fit a polynomial of degree 1 (a line) to noisy data, and you might miss important structure (underfitting). Fit a polynomial of degree 20, and it passes through every noisy point but oscillates wildly between them (overfitting).
Increase the polynomial degree. Too low = underfitting. Too high = overfitting.
The solution is model selection: choose the right complexity. Techniques include:
Least squares fails spectacularly when outliers are present. Robust estimation uses loss functions that grow more slowly than the squared error, limiting the influence of outliers.
| Loss Function | Formula | Behavior |
|---|---|---|
| Squared (L2) | ρ(r) = r2 | Outlier-sensitive |
| Absolute (L1) | ρ(r) = |r| | Less outlier-sensitive |
| Huber | Quadratic near 0, linear far | Best of both |
| Tukey biweight | Zero beyond threshold | Completely ignores outliers |
RANSAC (Random Sample Consensus) is an alternative strategy. Instead of modifying the loss, it samples random minimal subsets of the data, fits a model to each, and keeps the one with the most inliers. We will see RANSAC again in Chapter 8 (image stitching).
Compare how different loss functions penalize residual errors. Notice how robust losses cap the influence of large residuals.
To prevent overfitting, we add a regularization term to the objective function that penalizes model complexity:
The parameter λ controls the balance between data fidelity and smoothness. Common regularizers:
Increase λ to see how stronger regularization smooths the fit. Too little = overfitting; too much = underfitting.
Variational methods formulate vision problems as energy minimization. The idea: define an energy function that is low when the solution is good and high when it is bad, then find the minimum.
where D is the data term (how well does f explain the observed data g?) and S is the smoothness term (how regular is f?).
For continuous problems, the minimum satisfies the Euler-Lagrange equation, a partial differential equation. In practice, we discretize onto a pixel grid and solve a large sparse linear system.
The bilateral solver extends this by adding an edge-aware smoothness term based on a reference image. It smooths the solution everywhere except at edges in the reference, producing results that respect image boundaries.
A Markov random field (MRF) defines a probability distribution over a grid of labels (like pixel labels in segmentation). Each node in the grid has a label, and the probability depends on two things:
Finding the optimal labeling requires discrete optimization. Key algorithms:
| Algorithm | Guarantee | Speed |
|---|---|---|
| Graph cuts | Global optimum for binary labels | Fast |
| Belief propagation | Exact on trees, approximate on graphs | Moderate |
| Mean field | Approximate | Fast |
A noisy labeling (left) and the MRF-smoothed result (right). Pairwise potentials encourage neighboring pixels to agree.
A beautiful application of the variational framework: given a grayscale image with sparse color scribbles from a user, propagate the colors to the entire image. This is Levin et al.'s interactive colorization (2004).
The optimization minimizes color variation between neighboring pixels with similar intensities. Where the user provides a color hint, the data term pins the solution. Everywhere else, the smoothness term propagates colors along intensity contours.
Sparse color scribbles (left) propagate to fill the entire image (right) using a variational solver.
The optimization tools from this chapter are the backbone of computer vision. Here is where they reappear:
| Concept | Used In |
|---|---|
| Least squares | Ch 8 (Alignment), Ch 9 (Motion), Ch 11 (SfM — bundle adjustment) |
| RANSAC | Ch 8 (Homography estimation), Ch 11 (Pose estimation) |
| Regularization | Ch 5 (Weight decay in DNNs), Ch 9 (Optical flow), Ch 12 (Stereo) |
| Variational methods | Ch 9 (Horn-Schunck flow), Ch 12 (Dense stereo), Ch 13 (Surface reconstruction) |
| MRFs / CRFs | Ch 6 (Semantic segmentation), Ch 12 (Stereo labeling) |
| Graph cuts | Ch 12 (Stereo), Ch 10 (Image stitching seams) |
| Robust loss functions | Ch 9 (Robust optical flow), Ch 11 (Robust SfM) |