From depth images and point clouds to watertight surfaces: shape-from-X cues, scanning, surface fitting, volumetric methods, and model-based reconstruction.
You have a handful of depth maps from a stereo camera, or a point cloud from a LiDAR scan, or surface normals from photometric stereo. Each is a partial, noisy view of an object. How do you stitch them into a single, complete, watertight 3D model you can render, print, or simulate?
That is the 3D reconstruction problem. It spans everything from inferring shape from a single photograph (using shading, texture, or focus cues) to merging millions of laser scans into a museum-quality digital replica of Michelangelo's David.
Multiple partial scans (colored) overlap. Merging them produces a single coherent surface.
Look at a smooth white sphere under a single light source. Even from one image, you can see its shape from the way brightness varies across the surface. Bright where the surface faces the light, dark where it curves away. This is shape from shading.
For a diffuse (Lambertian) surface, brightness depends on the angle between the surface normal n and the light direction v:
where ρ is the albedo (surface reflectance). Since each pixel gives one equation but the surface normal has two unknowns (the surface slopes p = ∂z/∂x, q = ∂z/∂y), the problem is underdetermined. A smoothness constraint is needed to regularize it.
A sphere's brightness varies with the angle between the surface normal and the light direction. Adjust the light to see how shading reveals shape.
Shape from texture: A regularly textured surface (like a brick wall viewed at an angle) reveals its orientation through the foreshortening of the texture pattern. Where the surface tilts away from you, the texture appears compressed. Measuring this compression gives the local surface normal. It is a weaker cue than shading but works on textured surfaces where shading is ambiguous.
Shape from focus: Take many photos at different focus settings. Each pixel is sharpest when the focal plane hits the surface at that point. Plotting sharpness vs. focus distance and finding the peak gives depth. Telecentric optics eliminate the magnification change between focus settings, enabling sub-millimeter depth from defocus.
Shape from shading with a single light is underdetermined. The fix is elegant: take multiple images of the same scene, each with a different light direction. This is photometric stereo (Woodham, 1981).
For a diffuse surface with unknown albedo ρ and normal n, each light direction vk gives:
With three or more non-coplanar light directions, you can solve for ρn (albedo times normal) using linear least squares. Then normalize to get the normal and albedo separately.
A sphere lit from different directions. Each lighting reveals different surface normals. Combined, they uniquely determine every normal.
| Variant | Idea |
|---|---|
| Classical (3 lights) | Minimum configuration for Lambertian surfaces. Closed-form solution. |
| Many lights | Over-determined system. Robust to shadows, specularities via RANSAC or robust fitting. |
| Outdoor | Use the sun as a moving light source across a day. Webcam-based photometric stereo. |
| Multi-view + photometric | Combine stereo for coarse shape with photometric stereo for fine normals (e.g., Logothetis et al., 2019). |
Active illumination takes shape recovery out of the "clever algorithm" regime and into engineering precision. A 3D scanner projects controlled light and triangulates the reflected signal.
| Technology | Principle | Accuracy |
|---|---|---|
| Laser stripe | Sweep a laser plane across the object. The deformed stripe in the camera encodes depth via triangulation. | Sub-millimeter |
| Structured light | Project a coded pattern (Gray codes, phase shifting). Decode each pixel's pattern to find correspondences. | 10s of microns |
| Time of flight | Emit modulated light, measure phase shift of the return. | ~1 cm |
| Shadow scanning | Wave a stick casting a shadow. Track the shadow's sweep across known background planes (Bouguet & Perona, 1999). | ~1 mm (DIY) |
A single scan sees only the side facing the scanner. To reconstruct a complete object, you scan from many directions, align (register) the scans, and merge them into one surface.
Registration brings all scans into a common coordinate system. The gold standard is Iterative Closest Point (ICP):
This approach is remarkably robust to noise: outlier scans are diluted by the averaging. It is also the same method used in real-time systems like KinectFusion, which fuses 30 depth frames per second into a growing TSDF volume.
Two overlapping point clouds (blue and orange) are aligned iteratively. Watch ICP converge by clicking Step or Auto.
Once you have raw 3D data (points, normals, depth maps), you need a surface representation to store, render, and manipulate the model. The choice of representation profoundly affects what you can do with the model.
| Representation | What It Stores | Strengths | Weaknesses |
|---|---|---|---|
| Triangle mesh | Vertices + triangles | GPU-friendly, standard for rendering | Topology must be consistent, hard to merge |
| Point cloud | 3D positions (+ normals, colors) | Simple, no topology needed | No surface continuity, holes |
| Implicit (SDF) | Signed distance at grid points | Easy to merge, topologically flexible | Memory-heavy, resolution-limited |
| Subdivision surface | Coarse control mesh + rules | Smooth, multi-resolution, compact | Requires clean topology |
| NURBS / spline | Control points + knots | Smooth, compact, exact for CAD | Patching is hard for complex shapes |
Surface interpolation: Raw data is noisy and has gaps. Surface interpolation fills holes and smooths noise by fitting a smooth function (thin-plate spline, radial basis function, or variational surface) through the data points. The key is balancing data fidelity (stay close to the observations) against smoothness (do not overfit noise). This is a regularization trade-off — the same principle from Chapter 4.
Sometimes you do not need a full mesh. A point cloud — a set of 3D points with optional normals and colors — is the rawest 3D representation and often the most practical.
Key operations on point clouds:
Each point's normal is estimated from its local neighborhood. Adjust k to see how neighborhood size affects smoothness vs detail.
Volumetric methods represent 3D space on a regular grid and extract the surface as the boundary between "inside" and "outside." This is the most topologically flexible approach — it handles arbitrary genus, self-intersections, and merging naturally.
| Method | Implicit Function | Key Idea |
|---|---|---|
| TSDF fusion | Averaged signed distance | Average multiple depth observations. Fast, incremental (KinectFusion). |
| Poisson recon | Indicator function | Solve a Poisson equation using oriented normals. Produces smooth, watertight meshes (Kazhdan et al., 2006). |
| Level sets | Evolving SDF | Surface evolves under PDE forces (curvature, data fit). Topology changes handled naturally. |
| Space carving | Occupancy | Start with a full volume. Remove voxels inconsistent with silhouettes or photo-consistency. |
| Neural implicit (DeepSDF) | Learned SDF | A neural network predicts signed distance for any query point. Continuous, compact, learned from data. |
Watch how multiple depth observations are fused into a signed distance field. Each scan updates the volume. The zero-crossing converges to the true surface.
A 2D slice of a volume. Blue = positive (outside), red = negative (inside). The white zero-crossing is the surface. Add scans to see it converge.
| Concept | Used In |
|---|---|
| Shape from shading / photometric stereo | High-detail scanning, industrial inspection, combining with stereo (Ch 12) |
| ICP registration | SLAM (Ch 11), scan alignment, autonomous driving (LiDAR odometry) |
| TSDF fusion | KinectFusion, real-time 3D reconstruction, robotics mapping |
| Marching Cubes | Medical imaging (CT/MRI surfaces), volumetric rendering, neural implicits |
| Neural implicit surfaces | NeRF (Ch 14), DeepSDF, Occupancy Networks, generative 3D models |
| Point cloud processing | LiDAR perception, autonomous driving, PointNet/PointNet++ |
| Surface representations | Game assets, VFX, 3D printing, CAD/CAM |