Letting the model think longer: self-consistency, best-of-N & verifiers, chain-of-thought as compute, tree search, o1-style reasoning RL, the train-vs-test tradeoff, and the overthinking trap.

10 chapters9+ sims

NEW

U-Net

Seeing every pixel: encoder-decoder, the bottleneck blur, skip connections (the one big idea), upsampling & checkerboards, Dice loss, and why it became the backbone of diffusion models.

10 chapters9+ sims

NEW

Linear Attention & RWKV

Escaping the quadratic wall: where the n² comes from, the kernel/associativity trick, the recurrent dual form, the recall catch, RWKV’s decay & gating, the Mamba/RetNet family, and hybrid models.

10 chapters9+ sims

NEW

The denoising that paints images can drive a robot. Instead of regressing one action (and averaging multimodal demos into a crash), it generates an action chunk from noise — committing to a mode, conditioned on observations, replanned in a closed loop. The action head behind modern VLAs.

Point Clouds

Deep learning on unordered 3D point sets: permutation invariance via max pooling (PointNet), hierarchical local features (PointNet++), point transformers, and ICP registration. Where convolutions can’t go.

10 chapters7+ sims

NEW

Graph Neural Networks

Learning on nodes and edges via message passing: each node gathers neighbor messages, aggregates symmetrically, and updates. GCN, GAT, over-smoothing, and the unifying view (a transformer is a GNN on a full graph).

10 chapters8+ sims

NEW

Time-Series Forecasting

Predicting the future with its uncertainty: lookback→horizon, trend/seasonality decomposition, probabilistic (quantile) forecasts, global models, N-BEATS, the Temporal Fusion Transformer, PatchTST, and zero-shot foundation models.

10 chapters9+ sims

NEW

Gaussian Processes

Fit a distribution over functions, not one curve — getting calibrated uncertainty for free. Kernels, conditioning, error bars, the n³ wall, and the broader uncertainty toolkit (deep ensembles, MC dropout, conformal prediction).

10 chapters9+ sims

NEW

Recommender Systems

The engines behind Netflix/Amazon/YouTube: matrix factorization, user & item embeddings, the two-tower retrieval model, the retrieve-then-rank funnel, DLRM, and hard problems (cold start, feedback loops). Personalization at billion-item scale.

10 chapters8+ sims

LLM Inference & Adaptation3

NEW

Tokenization

How text becomes the integers a model sees: the word-vs-character trade-off, byte-pair encoding built from scratch, encoding by merge-replay, WordPiece & Unigram, byte-level BPE (no OOV ever), and SentencePiece’s ▁ spaces. Why models can’t count the r’s in “strawberry.”

10 chapters9+ sims

NEW

Sampling & Decoding

The model outputs a distribution, not a word — decoding turns it into text. Logits & softmax, greedy’s repetition trap, temperature, top-k, top-p (nucleus), beam search, and the modern penalty pipeline. The dial between robotic and unhinged.

10 chapters8+ sims

NEW

LoRA & PEFT

Co-adaptation, the mask & inverted-dropout scaling, the ensemble view, spatial dropout, DropConnect, stochastic depth/DropPath, DropBlock — breaking your network on purpose to make it generalize.

Updating belief with evidence, from zero. The disease-test trap (a positive test is only fifty-fifty, not eighty), the base-rate effect, and a live posterior Code Lab.

Independent Component Analysis

Unmix what was blended — the cocktail party problem. The mixing model x=As, finding the unmixing W, independence as the criterion, why ICA needs non-Gaussian sources, a live un-mixing lab, and how ICA differs from PCA (independence beats uncorrelatedness).

Flow & Diffusion Models

Turn noise into data by simulating a differential equation: vector fields, flows, Euler's method, Brownian motion, SDEs and the Ornstein–Uhlenbeck process — built and run by hand.

10 chapters4 sims · 3 labs

Lec 2

Flow Matching

Where do the arrows come from? Train the vector field — conditional & marginal probability paths, the marginalization trick, and the simulation-free flow matching loss.

10 chapters4 sims · 4 labs

Lec 3

Score Matching & Guidance

The score bridges the ODE and SDE samplers, gives denoising score matching, and powers classifier-free guidance — how a model is made to obey a prompt.

10 chapters4 sims · 5 labs

Lec 4

Architectures & Latent Space

The vector field is a neural net: sinusoidal time embeddings, U-Nets, Diffusion Transformers, and diffusing in a VAE latent space — Stable Diffusion 3 and Movie Gen.

11 chapters4 sims · 3 labs

Lec 5

Discrete Diffusion

Text is discrete, so Gaussian noise won't do: continuous-time Markov chains, masking/absorbing states, and the parallel un-masking sampler behind diffusion language models.

10 chapters3 sims · 3 labs

Notes

The Lecture Notes (Veanor)

The complete rigorous companion — the full 84-page 6.S184 notes as one interactive reference: every definition, theorem and derivation across all five lectures.

14 chapters3 sims · 3 labs

Introduction to Robot Learning25

Lec 1

What Is Robot Learning?

Sequential decisions in the physical world: the closed loop, the compounding-error trap that breaks naive imitation, the robot data problem, the four pillars, and a preview of the whole course.

10 chapters6 sims · 3 labs

Lec 2

Robot Learning: An Overview

The agent–environment loop and the reward hypothesis, a policy as a state→action map, the ladder of methods from imitation to RL, known vs learned worlds, and high-level plans over low-level skills.

9 chapters5 sims · 3 labs

Lec 3

ML / DL Refresher, Part 1

Supervised learning, loss & empirical risk, linear and logistic regression by hand, gradient descent and the learning rate, overfitting, the bias–variance tradeoff, and regularization.

10 chapters5 sims · 3 labs

Lec 4

ML / DL Refresher, Part 2

Why depth and nonlinearity, the MLP and backprop by hand, optimizers from SGD to Adam, CNNs, attention and Transformers, generative models, and knowing when the model is uncertain.

10 chapters8 sims · 3 labs

Lec 5

MDP Basics & Imitation Learning, Part 1

The MDP tuple, policies and trajectories, discounted return and value functions, the Bellman idea, and behavior cloning — plus why copying an expert is not standard supervised learning.

10 chapters5 sims · 3 labs

Lec 6

Imitation Learning, Part 2

DAgger and how it beats the quadratic cost, privileged teachers, GAIL as distribution matching with a discriminator, and Diffusion Policy for multimodal demonstrations.

10 chapters7 sims · 3 labs

Lec 7

RL Basics: Value & Policy Iteration

Value and Q functions, the Bellman expectation and optimality equations derived, dynamic programming — policy evaluation, policy improvement, policy iteration and value iteration — worked by hand on a gridworld.

10 chapters6 sims · 3 labs

Lec 8

Q-Learning & Variants

Model-free control from samples: Monte Carlo vs TD, SARSA vs Q-learning and the off-policy max, ε-greedy exploration, the cliff, and DQN's replay buffer and target network that tame the deadly triad.

10 chapters7 sims · 3 labs

Lec 9

Policy Gradient Methods

Optimize the policy directly: the log-derivative trick, the policy gradient theorem, REINFORCE, why the estimator is unbiased but high-variance, and reward-to-go and baselines that calm it down.

10 chapters5 sims · 3 labs

Lec 10

Actor-Critic Methods

A learned critic cuts policy-gradient variance: the advantage function, the TD error, two networks in one loop, and the n-step / GAE dial that trades bias against variance.

10 chapters6 sims · 3 labs

Lec 11

Advanced RL: TRPO, PPO, DDPG & SAC

The biggest safe step: natural gradients and the KL trust region, PPO's clipped surrogate, deterministic off-policy control with DDPG/TD3, and maximum-entropy RL with SAC — mapped on one 2×2 grid.

10 chapters6 sims · 3 labs

Lec 12

Model-Based Control Basics

When you know the physics: state-space models, feedback, PID, stability from eigenvalues, and LQR — the optimal linear controller — with the Riccati recursion worked by hand.

10 chapters5 sims · 3 labs

Lec 13

Optimal Control & Planning, Part 1

Plan a whole control sequence: shooting vs collocation, the LQR backward pass derived line by line, and iLQR / DDP — linearize, solve LQR, repeat — plus sequential convex programming.

10 chapters5 sims · 3 labs

Lec 14

Optimal Control & Planning, Part 2

When you can't write the equations: MPC and replanning, random shooting, the cross-entropy method and MPPI, and learned-model RL — PILCO, PETS ensembles, and MBPO.

11 chapters5 sims · 3 labs

Lec 15

Deep Model-Based RL: Dreamer & TD-MPC

Learn a latent world model from pixels and plan inside it: Dreamer's learning-in-imagination, TD-MPC's plan-in-latent with a learned value, and how model error compounds over a rollout.

10 chapters3 sims · 3 labs

Lec 16

Learning Structured World Models

Guest lecture: model dough, beans, and fluids as particles plus relations and learn the dynamics with a graph network — message passing, permutation-equivariance, and planning with a learned simulator.

11 chapters3 sims · 3 labs

Lec 17

Offline Reinforcement Learning

Guest lecture: learn from a fixed log with no new interaction — the out-of-distribution overestimation death spiral, and the fixes: policy constraints, conservatism (CQL), in-sample IQL, and Diffuser.

10 chapters6 sims · 3 labs

Lec 18

Inverse Reinforcement Learning

Recover the reward from demonstrations: why it's ambiguous, feature matching, max-margin, and the maximum-entropy principle — the least-committal reward that explains the expert.

10 chapters6 sims · 3 labs

Lec 19

Bandits & Preference-Based Learning

The cleanest explore–exploit problem: regret, ε-greedy, UCB optimism, Thompson sampling, and dueling bandits / preference learning — the foundation under RLHF.

11 chapters5 sims · 3 labs

Lec 20

Exploration in Reinforcement Learning

When rewards are sparse and random actions never stumble onto them: count-based bonuses, curiosity as prediction error, the noisy-TV trap, and Random Network Distillation.

11 chapters3 sims · 3 labs

Lec 21

Robot Simulation & Sim2Real

The reality gap: why a sim-perfect policy falls on the real robot, domain randomization (reality is just sim #10,001), system identification, and champion-level drone racing.

10 chapters5 sims · 3 labs

Lec 22

Safe RL & Safe Robot Learning

When you can't learn from the crash: constrained MDPs, safe exploration, control barrier functions, and safety filters that project any unsafe action onto the nearest safe one.

10 chapters3 sims · 3 labs

Lec 23

Multi-Task, Adaptive & Transferable Learning

Adapt within a step: teacher–student distillation, RMA inferring terrain from recent motion, and Neural-Fly's online residual adaptation for agile flight in wind.

11 chapters4 sims · 3 labs

Lec 24

Foundation Models in Robotics

Guest lecture: borrow the web's common sense — SayCan's say×can grounding, CLIPort, RT-1's tokenized actions, Code-as-Policies, and the move toward vision-language-action models.

10 chapters5 sims · 3 labs

Lec 25

Course Summary: The Robot-Learning Spine

Debugging & Consistency

The capstone skill: how to know your fusion works. NEES/NIS chi-square tests, innovation whiteness, and robust costs (Huber, DCS, switchable constraints) that catch the silent killer — overconfidence.

12 chapters5 sims · 3 labs

Atlas

The Sensor Fusion Atlas

The ultimate map: an interactive concept graph wiring all 21 methods together, a 1960–2024 historical timeline, and the full curated reference library. One page for the whole journey.

The whole arc — Kalman filters to robot foundation models — plus a build ladder of 8 runnable in-browser Python labs: implement a Kalman filter, EKF Jacobian, unscented transform, particle filter, pose-graph SLAM, VIO bias estimation, Q-learning vs SARSA, and a diffusion policy yourself.

Model Compression

INT8/INT4 quantization, pruning, GPTQ/AWQ, profiling, post-training vs QAT, compression pipelines.

10 chapters8 sims

Efficient Architectures

Depthwise separable convs, MobileNet, EfficientNet, NAS, hardware co-design, mobile deployment.

10 chapters9 sims

LLM Inference Optimization

KV cache, Flash/MQA/GQA attention, continuous batching, speculative decoding, TP/PP parallelism.

10 chapters8 sims

LLM Inference: Tokens to Production

The complete inference stack: tokenization, prefill/decode, GPU hardware, metrics, batching, FlashAttention, speculative decoding, quantization, parallelism, production serving.

13 chapters16 sims

Systems & Hardware1

LESSON

GPU Kernel Landscape 2021–2026

CUDA kernels, Triton, FlashAttention, fused ops, warp scheduling, memory hierarchy, kernel profiling, the evolution of GPU programming.

11 chapters6 sims

LESSON

Thinking in JAX: Functional Array Computing

Pure functions, jit, vmap, grad, pytrees, XLA compilation, PRNG, sharding — the mental model shift from PyTorch to JAX.

11 chapters11 sims

Probability Distributions5

Continuous Univariate

Gaussian, Student-t, Laplace, Beta, Gamma, Exponential, Weibull, Chi-Squared, Von Mises + 5 more.

16 chapters14 distributions

Discrete

Bernoulli, Categorical, Binomial, Poisson, Geometric, Hypergeometric, Multinomial + 2 more.

11 chapters9 distributions

Multivariate

MVN, Dirichlet, Wishart, Gaussian Process, Dirichlet Process, Von Mises-Fisher + 4 more.

12 chapters10 distributions

Bayesian & Inference

GMM, Particle, Horseshoe, Spike-and-Slab, Gumbel, Kumaraswamy + 4 more.

12 chapters10 distributions

Specialized & Advanced

Bayesian Networks

Variables with dependencies — DAGs, d-separation, message passing.

10 chapters

Robotics & Perception5

NEW

Inverse Kinematics

2R/6R robot arms, geometric subproblems, 3D IK solvers, singularities, workspace — heavy 3D & 2D sims.

12 chapters12 sims

LESSON 21

Classical SLAM

The chicken-and-egg problem — EKF-SLAM, particle filters, graph-based.

11 chapters

LESSON 22

Classical VIO

IMU + camera fusion — preintegration, MSCKF, tightly-coupled.

10 chapters

LESSON 23

Modern SLAM

Deep features, neural implicit maps, Gaussian splatting SLAM.

9 chapters

LESSON 24

Modern VIO

Learned inertial models, transformer odometry, foundation models.

8 chapters

Decision & Control3

LESSON 25

MDP

States, actions, rewards — the formal language of sequential decisions.

9 chapters

LESSON 26

POMDP

When you can't see the full state — belief-space planning.

9 chapters

LESSON 27

RL Algorithms

Sim2Real Robot Learning

Crossing the reality gap: domain randomization, privileged teacher–student adaptation, real2sim residuals & actuator nets, learning from human data, and the Sim2Real 1.0→4.0 map.

11 sectionsCS 224R

RL for Robot Foundation Models

Pushing a vision-language-action model past the 80% imitation plateau with RL: offline-RL-as-supervised-learning, diffusion steering, and small edit policies. When the model is too big, too weird, too expensive for textbook RL.

10 sectionsCS 224R

Frontiers of Deep RL & How to Do Research

LoRA Without Regret

When LoRA fails, regularization in PEFT, merging adapters, QLoRA, DoRA, future of adaptation.

8 chaptersCS224N

Lec 19

Open Questions in NLP 2026

Reasoning, grounding, efficiency, safety, multimodal frontier, interactive research map.

GANs, rectified flow, classifier-free guidance, latent diffusion, DiT, text-to-image & video generation.

10 chapters

System Design1

Research-backed system design lessons. Real architectures, real numbers, real tradeoffs — with interactive Canvas simulations that trace requests, visualize scale, and simulate failures.