Overview 11

NeRF / 3D Gaussian
Splatting

Reconstructing 3D worlds from 2D photographs — from implicit neural radiance fields to explicit, real-time Gaussian representations.

2020 / 2023 Year
Mildenhall / Kerbl Key Works
3D Reconstruction Category

What Is It?

The goal is deceptively simple: given a handful of ordinary photographs taken from different viewpoints, reconstruct the full 3D scene so you can render it from any new angle. Two paradigms dominate.

NeRF (Neural Radiance Fields) uses an implicit neural representation. A small MLP takes a 3D coordinate (x, y, z) and a viewing direction (θ, φ) and outputs a color (r, g, b) and a volume density σ. The scene lives entirely inside the network weights — there is no explicit geometry.

3D Gaussian Splatting (3DGS) takes the opposite approach: an explicit representation consisting of millions of colored, semi-transparent 3D Gaussians. Each Gaussian has a position, covariance matrix (shape/orientation), color (via spherical harmonics), and opacity. These are rasterized in real-time using a differentiable tile-based renderer — no neural network at render time.


NeRF Architecture

For every pixel in the target image, NeRF casts a ray through the scene, samples points along it, queries the MLP at each sample, and composites the results via volume rendering.

Ray Marching & Volume Rendering Interactive
Click "Cast Ray" to begin

The pipeline: Ray → Sample Points → Positional Encoding → MLP(x,y,z,θ,φ) → (color, σ) → Volume Rendering Integral → Pixel Color. Training minimizes the photometric loss between rendered and ground-truth pixels across all training views.


3D Gaussian Splatting Architecture

3DGS starts from a sparse point cloud produced by Structure from Motion (SfM), then optimizes each point into a full 3D Gaussian with learnable position, covariance, color (spherical harmonics coefficients), and opacity.

Gaussian Splatting Pipeline Interactive
SfM initialization — click Optimize

The pipeline: SfM Points → Initialize Gaussians → Differentiable Rasterization → Rendered Image → Photometric Loss → Gradient Update. Adaptive density control splits large Gaussians and clones under-reconstructed ones, while pruning transparent or redundant splats.


NeRF vs 3D Gaussian Splatting

Side-by-Side Comparison Interactive

NeRF

RepresentationImplicit (MLP)
TrainingHours
RenderingSeconds/frame
QualitySmooth, continuous
EditabilityDifficult
MemoryLow (weights only)

3D Gaussian Splatting

RepresentationExplicit (Gaussians)
TrainingMinutes
Rendering100+ FPS real-time
QualitySharp, detailed
EditabilityDirect manipulation
MemoryHigh (millions of splats)

Core Concepts

Volume Rendering

Integrate color and density along a ray: C(r) = ∫ T(t) · σ(t) · c(t) dt, where T(t) is the accumulated transmittance. The foundation of NeRF's differentiable rendering.

Positional Encoding

NeRF maps (x,y,z) through sin/cos at multiple frequencies before feeding the MLP, enabling the network to represent high-frequency scene detail that a plain MLP would smooth away.

Spherical Harmonics

3DGS encodes view-dependent color using spherical harmonic coefficients per Gaussian. This enables specular highlights and reflections without any neural network evaluation.

Differentiable Rasterization

3DGS projects 3D Gaussians onto the image plane via a tile-based rasterizer. Crucially, the entire forward pass is differentiable, enabling gradient-based optimization of every Gaussian parameter.

Multi-View Supervision

Both methods train on photometric loss (L1 + SSIM) across many calibrated viewpoints. No 3D ground truth is needed — the 3D structure emerges purely from multi-view consistency.

Structure from Motion

SfM (e.g., COLMAP) estimates camera poses and a sparse 3D point cloud from input photos. This provides both the camera calibration and the initialization for 3DGS Gaussians.


Volume Rendering Integral

The mathematical heart of NeRF. For a ray r(t) = o + td, the expected color is:

C(r) = ∫tntf T(t) · σ(r(t)) · c(r(t), d)  dt   where   T(t) = exp(−∫tnt σ(r(s)) ds)
Volume Rendering Along a Ray Interactive
Transmittance & density integration

In practice this integral is approximated by quadrature: sample N points along the ray, evaluate the MLP at each, then alpha-composite front-to-back. NeRF uses a coarse-to-fine (hierarchical) sampling strategy to concentrate samples where density is high.


Generative 3D

Moving beyond reconstruction: generating novel 3D content from minimal input — a single image, or even just text.

Zero-1-to-3

Generates novel views of an object from a single input image using a fine-tuned diffusion model conditioned on relative camera pose. Enables single-image 3D reconstruction.

LGM (Large Gaussian Model)

A feed-forward transformer that directly predicts 3D Gaussians from multi-view images in a single pass — no per-scene optimization needed. Sub-second 3D generation.

Instant Mesh

Combines multi-view diffusion with a sparse-view reconstruction model (LRM) to produce textured meshes from a single image in seconds. Bridges 2D generation and 3D assets.

DreamFusion

Text-to-3D via Score Distillation Sampling (SDS): uses a pretrained 2D diffusion model as a critic to optimize a NeRF so that every rendered view looks plausible given the text prompt.


SDF Representations

A complementary paradigm: instead of radiance fields or Gaussians, represent geometry as a Signed Distance Function (SDF) — for every point in space, store the distance to the nearest surface (positive outside, negative inside). The zero-level set defines a watertight mesh.

DeepSDF

An auto-decoder that maps a latent code + 3D coordinate to an SDF value. Learns a shape prior across a category (e.g., chairs, cars). Pioneered neural implicit geometry.

NeuS

Replaces NeRF's density field with an SDF, using a novel volume rendering formulation that converts SDF values to density via a logistic function. Produces high-quality watertight meshes while retaining NeRF's multi-view training approach. NeuS2 and Neuralangelo push this further with hash-grid encodings for room-scale reconstruction.


Applications

VR / AR Photorealistic scene capture for immersive experiences. 3DGS enables real-time headset rendering at the frame rates VR demands.
Robotics Scene understanding and manipulation planning. Robots use NeRF/3DGS scene representations to reason about geometry, plan grasps, and navigate.
Autonomous Driving Reconstructing driving scenes for simulation, sensor modeling, and closed-loop testing. LiDAR + camera fusion with neural fields.
Digital Twins Photogrammetric capture of buildings, factories, and infrastructure. Rapid scan-to-model pipelines powered by 3DGS for inspection and monitoring.