NeRF / 3D Gaussian
Splatting
Reconstructing 3D worlds from 2D photographs — from implicit neural radiance fields to explicit, real-time Gaussian representations.
What Is It?
The goal is deceptively simple: given a handful of ordinary photographs taken from different viewpoints, reconstruct the full 3D scene so you can render it from any new angle. Two paradigms dominate.
NeRF (Neural Radiance Fields) uses an implicit neural representation.
A small MLP takes a 3D coordinate (x, y, z) and a viewing direction (θ, φ)
and outputs a color (r, g, b) and a volume density σ.
The scene lives entirely inside the network weights — there is no explicit geometry.
3D Gaussian Splatting (3DGS) takes the opposite approach: an explicit representation consisting of millions of colored, semi-transparent 3D Gaussians. Each Gaussian has a position, covariance matrix (shape/orientation), color (via spherical harmonics), and opacity. These are rasterized in real-time using a differentiable tile-based renderer — no neural network at render time.
NeRF Architecture
For every pixel in the target image, NeRF casts a ray through the scene, samples points along it, queries the MLP at each sample, and composites the results via volume rendering.
The pipeline: Ray → Sample Points → Positional Encoding → MLP(x,y,z,θ,φ) → (color, σ) → Volume Rendering Integral → Pixel Color. Training minimizes the photometric loss between rendered and ground-truth pixels across all training views.
3D Gaussian Splatting Architecture
3DGS starts from a sparse point cloud produced by Structure from Motion (SfM), then optimizes each point into a full 3D Gaussian with learnable position, covariance, color (spherical harmonics coefficients), and opacity.
The pipeline: SfM Points → Initialize Gaussians → Differentiable Rasterization → Rendered Image → Photometric Loss → Gradient Update. Adaptive density control splits large Gaussians and clones under-reconstructed ones, while pruning transparent or redundant splats.
NeRF vs 3D Gaussian Splatting
NeRF
3D Gaussian Splatting
Core Concepts
Volume Rendering
Integrate color and density along a ray: C(r) = ∫ T(t) · σ(t) · c(t) dt, where T(t) is the accumulated transmittance. The foundation of NeRF's differentiable rendering.
Positional Encoding
NeRF maps (x,y,z) through sin/cos at multiple frequencies before feeding the MLP, enabling the network to represent high-frequency scene detail that a plain MLP would smooth away.
Spherical Harmonics
3DGS encodes view-dependent color using spherical harmonic coefficients per Gaussian. This enables specular highlights and reflections without any neural network evaluation.
Differentiable Rasterization
3DGS projects 3D Gaussians onto the image plane via a tile-based rasterizer. Crucially, the entire forward pass is differentiable, enabling gradient-based optimization of every Gaussian parameter.
Multi-View Supervision
Both methods train on photometric loss (L1 + SSIM) across many calibrated viewpoints. No 3D ground truth is needed — the 3D structure emerges purely from multi-view consistency.
Structure from Motion
SfM (e.g., COLMAP) estimates camera poses and a sparse 3D point cloud from input photos. This provides both the camera calibration and the initialization for 3DGS Gaussians.
Volume Rendering Integral
The mathematical heart of NeRF. For a ray r(t) = o + td, the expected color is:
In practice this integral is approximated by quadrature: sample N points along the ray, evaluate the MLP at each, then alpha-composite front-to-back. NeRF uses a coarse-to-fine (hierarchical) sampling strategy to concentrate samples where density is high.
Generative 3D
Moving beyond reconstruction: generating novel 3D content from minimal input — a single image, or even just text.
Zero-1-to-3
Generates novel views of an object from a single input image using a fine-tuned diffusion model conditioned on relative camera pose. Enables single-image 3D reconstruction.
LGM (Large Gaussian Model)
A feed-forward transformer that directly predicts 3D Gaussians from multi-view images in a single pass — no per-scene optimization needed. Sub-second 3D generation.
Instant Mesh
Combines multi-view diffusion with a sparse-view reconstruction model (LRM) to produce textured meshes from a single image in seconds. Bridges 2D generation and 3D assets.
DreamFusion
Text-to-3D via Score Distillation Sampling (SDS): uses a pretrained 2D diffusion model as a critic to optimize a NeRF so that every rendered view looks plausible given the text prompt.
SDF Representations
A complementary paradigm: instead of radiance fields or Gaussians, represent geometry as a Signed Distance Function (SDF) — for every point in space, store the distance to the nearest surface (positive outside, negative inside). The zero-level set defines a watertight mesh.
DeepSDF
An auto-decoder that maps a latent code + 3D coordinate to an SDF value. Learns a shape prior across a category (e.g., chairs, cars). Pioneered neural implicit geometry.
NeuS
Replaces NeRF's density field with an SDF, using a novel volume rendering formulation that converts SDF values to density via a logistic function. Produces high-quality watertight meshes while retaining NeRF's multi-view training approach. NeuS2 and Neuralangelo push this further with hash-grid encodings for room-scale reconstruction.