← Engineermaxxing
Feynman-Level Interactive Lessons

Gleams

Build intuition from absolute zero. Every concept from first principles, with interactive simulations, step-by-step math, and quizzes. No prerequisites beyond curiosity.

149
Lessons
1256+
Chapters
826+
Simulations
No matches found.
AI Architectures31
LESSON 01

microGPT

From absolute zero to understanding every line of Karpathy's 243-line GPT.

11 chapters15+ sims
LESSON 02

Transformer

Self-attention, multi-head, KV cache, MoE — the architecture behind everything.

11 chapters
NEW

Vision Transformer (ViT)

Split an image into patches, treat them as tokens, run a Transformer. How ViT unified vision and language architectures.

10 chapters10+ sims
NEW

Universal Architecture

Why one design rules everything — retrofitting patterns, cross-attention, conditioning zoo, composition.

10 chapters12+ sims
LESSON 03

SSM / Mamba

Linear recurrence, selective scan — the O(n) alternative to attention.

10 chapters
LESSON 04

Diffusion Models

Add noise, learn to reverse it. The dominant generative paradigm.

10 chapters
LESSON 05

Flow Matching

Straight paths from noise to data — simpler, faster than diffusion.

9 chapters
NEW

Diffusion Transformer (DiT)

Replace the U-Net with a Transformer. adaLN-Zero conditioning, scaling laws, and how DiT powers SD3, FLUX, and Sora.

10 chapters10+ sims
LESSON 06

VAE / VQ-VAE

The secret plumbing — variational inference, codebooks, tokenization.

10 chapters
LESSON 07

GAN

Generator vs discriminator — the adversarial game.

9 chapters
LESSON 08

Contrastive / CLIP

The glue connecting images and text in a shared space.

10 chapters
LESSON 09

VLM

Teaching language models to see — vision encoder + LLM fusion.

10 chapters
LESSON 10

VLA

Foundation models that physically act in the world.

10 chapters
LESSON 11

World Models

Learning to imagine before acting — prediction as intelligence.

9 chapters
LESSON 12

NeRF / 3DGS

Reconstructing 3D worlds from 2D photographs.

10 chapters
LESSON 13

Reward / Alignment

RLHF, DPO, Constitutional AI — making AI do what we want.

10 chapters
LESSON 14

Agent Evaluation

From vibe checks to rigorous testing. Graders, pass@K/pass^K, τ-bench, Terminal-Bench, the Swiss cheese model.

12 chapters12+ sims
NEW

Mixture of Experts

Scaling without paying for it: experts & routing, top-k gating, the collapse problem & load-balancing loss, capacity/token-dropping, Switch, Mixtral & DeepSeek, expert parallelism.

10 chapters9+ sims
NEW

Test-Time Compute

Letting the model think longer: self-consistency, best-of-N & verifiers, chain-of-thought as compute, tree search, o1-style reasoning RL, the train-vs-test tradeoff, and the overthinking trap.

10 chapters9+ sims
NEW

U-Net

Seeing every pixel: encoder-decoder, the bottleneck blur, skip connections (the one big idea), upsampling & checkerboards, Dice loss, and why it became the backbone of diffusion models.

10 chapters9+ sims
NEW

Linear Attention & RWKV

Escaping the quadratic wall: where the n² comes from, the kernel/associativity trick, the recurrent dual form, the recall catch, RWKV’s decay & gating, the Mamba/RetNet family, and hybrid models.

10 chapters9+ sims
NEW

xLSTM

The LSTM reborn: the two fatal flaws, saturating sigmoids, exponential gating + normalizer (sLSTM), matrix memory (mLSTM), parallelizability, the revision lab, the block, and the recurrent-revival family.

10 chapters9+ sims
NEW

RetNet

The impossible triangle: retention (attention minus softmax plus decay), and one mechanism computed three equivalent ways — parallel to train, recurrent to deploy, chunkwise to scale. Multi-scale heads & the family.

10 chapters9+ sims
NEW

Hyena & Long Convolutions

Attention by convolution: a sequence-length filter, made practical by implicit filters (any length, few params), the FFT (n log n), and data-controlled gating — plus the recurrence=convolution duality.

10 chapters9+ sims
NEW

Jamba

The hybrid that works: interleave mostly-Mamba layers with a few attention layers (for recall) plus MoE (for cheap capacity) — why a few attention layers suffice, the KV-cache memory win, and the hybrid era.

10 chapters9+ sims
NEW

Griffin & Hawk

Google DeepMind’s RNN-speed transformers: a gated linear recurrence (RG-LRU) for cheap global memory + local attention for sharp recall. Linear recurrence → parallel scan → stability → gating → the hybrid stack → constant-size inference.

10 chapters9+ sims
NEW

JEPA

Yann LeCun’s Joint Embedding Predictive Architecture: stop predicting pixels, predict meaning. Why pixel loss blurs → latent prediction → collapse & the EMA fix → predictor + mask tokens → masking → the full I-JEPA loop → world models.

10 chapters9+ sims
NEW

Perceiver & Perceiver IO

One architecture for any modality: funnel a huge input through a small latent bottleneck with cross-attention. The n² wall → latent workspace → cross-attention → latent thinking → Fourier position → iteration → arbitrary outputs via query arrays.

10 chapters9+ sims
NEW

Neural ODEs

Networks with infinite, continuous depth. A residual block is one Euler step — take the limit and depth becomes a flow field solved by an ODE solver: adaptive computation, O(1)-memory adjoint gradients, and the continuous-normalizing-flow ancestor of diffusion.

10 chapters9+ sims
NEW

KANs

Kolmogorov–Arnold Networks: flip the MLP — put learnable 1-D functions (splines) on the edges, let nodes just sum. The result is a network you can read, prune, refine, and turn into a symbolic formula.

10 chapters9+ sims
NEW

Liquid Neural Networks

MIT’s tiny, robust brains: continuous-time neurons whose response speed adapts to the input. Leaky neuron → liquid time constant → bounded stability → robustness → neural circuit policies → closed-form CfC. 19 neurons can drive a car.

10 chapters9+ sims
NEW

Whisper & Audio Transformers

How machines learned to hear: turn sound into a log-mel picture, run a vanilla transformer encoder-decoder over it, control the task with prompt tokens, and train on 680,000 hours of the messy internet. The audio front-end behind all modern speech models.

10 chapters9+ sims
NEW

Video Generation

Sora-style spacetime diffusion: compress video into a spacetime latent, chop it into patches spanning space and time, and denoise the whole clip with a diffusion transformer whose attention crosses frames — coherence by construction, plus emergent world-model behavior.

10 chapters9+ sims
NEW

Diffusion Policy

The denoising that paints images can drive a robot. Instead of regressing one action (and averaging multimodal demos into a crash), it generates an action chunk from noise — committing to a mode, conditioned on observations, replanned in a closed loop. The action head behind modern VLAs.

10 chapters9+ sims
Audio & Speech5
Modalities & Methods5
LLM Inference & Adaptation3
Training Foundations18
NEW

Loss Functions

MSE, cross-entropy, KL divergence, Huber, contrastive, triplet, InfoNCE — every loss derived from scratch with interactive comparisons.

10 chapters12+ sims
NEW

Optimizers

SGD, momentum, AdaGrad, RMSProp, Adam, AdamW, Lion, Sophia — every update rule derived, hand-computed, and raced head-to-head.

10 chapters14+ sims
NEW

Normalization

BatchNorm, LayerNorm, RMSNorm, GroupNorm, InstanceNorm, Pre-LN vs Post-LN — every variant derived, hand-computed, and raced in the Arena.

10 chapters12+ sims
NEW

Attention Variants

MHA, MQA, GQA, sliding window, linear attention, FlashAttention — every variant derived with memory arithmetic and raced in the Arena.

11 chapters14+ sims
NEW

Positional Encoding

Sinusoidal, learned, RoPE, ALiBi, NTK scaling — why transformers need position and how rotation won, with length extrapolation arena.

10 chapters12+ sims
NEW

Activation Functions

Sigmoid, ReLU, GELU, SiLU/Swish, SwiGLU, Mish — every nonlinearity derived with dead neuron analysis and racing arena.

10 chapters12+ sims
NEW

Learning Rate Schedules

Warmup, step decay, cosine annealing, 1cycle, WSD — every schedule derived with loss landscape simulations and racing arena.

10 chapters12+ sims
NEW

Initialization

Xavier, He/Kaiming, orthogonal, transformer recipes — every method derived from variance preservation with deep network explorer.

10 chapters12+ sims
NEW

Gradient Flow

Vanishing/exploding gradients, clipping, accumulation, mixed precision, loss scaling, checkpointing — the complete stability toolkit.

10 chapters12+ sims
NEW

Skip Connections

ResNet residuals, Pre-LN vs Post-LN, DenseNet, Highway — why every modern architecture uses shortcuts and how to choose.

10 chapters12+ sims
NEW

Pooling & Aggregation

Max, average, GAP, CLS, mean, attention, GeM pooling — how to collapse features into fixed-size vectors for any task.

10 chapters12+ sims
NEW

Embedding Layers

Token, position, segment, patch embeddings — tied weights, scaling, subword effects, and the lookup tables behind every model.

10 chapters12+ sims
NEW

Training Loop Mechanics

Epochs, batches, DataLoaders, shuffling, the 6-step training loop, eval mode, common bugs — the complete anatomy of training.

10 chapters12+ sims
NEW

Data Augmentation

Flips, crops, color jitter, RandAugment, Mixup, CutMix, text augmentation, TTA — synthetic variations that prevent overfitting.

10 chapters12+ sims
NEW

Curriculum Learning

Difficulty scoring, Bengio’s classical curriculum, pacing functions, self-paced learning, teacher-student bandits, DoReMi data mixing — and when ordering doesn’t help.

10 chapters9+ sims
NEW

Contrastive Learning

Self-supervised vision without labels: InfoNCE, temperature, the projection head you throw away, MoCo’s queue, and how BYOL & DINO dodge collapse with no negatives.

10 chapters9+ sims
NEW

Knowledge Distillation

Teaching a small model to think like a giant: dark knowledge, temperature softening, the KD loss, feature & attention matching, self-distillation, DistilBERT, and the capacity-gap trap.

10 chapters9+ sims
NEW

Dropout Variants

Co-adaptation, the mask & inverted-dropout scaling, the ensemble view, spatial dropout, DropConnect, stochastic depth/DropPath, DropBlock — breaking your network on purpose to make it generalize.

10 chapters9+ sims
AI & Search2
Mathematics Foundations1
EE269 — Signal Processing for ML (31 Lectures)31
Lec 1

Introduction

SP+ML overview, applications, course roadmap, full pipeline demo.

8 chPilanci
Lec 2

Discrete Signals

x[n], δ[n], periodicity theorem, Nyquist sampling, aliasing.

8 chPilanci
Lec 3

Quantization Noise

Bennett’s theorem, SQNR, 6 dB/bit rule, proof sketch.

8 chPilanci
Lec 4

Lloyd-Max Quantizers

Non-uniform, centroid/boundary conditions, 1/3 power law, NF4 for LLMs.

10 chPilanci
Lec 5

Dithering & Stochastic Rounding

Linearization, subtractive dither, NVFP4, gradient accumulation.

8 chPilanci
Lec 6

DFT

Fourier basis, orthogonality, DFT/IDFT, FFT butterfly.

8 chPilanci
Lec 7

Spectral Descriptors

Centroid, spread, kurtosis, entropy, flatness, flux.

8 chPilanci
Lec 8

STFT

Windowed DFT, spectrogram, uncertainty principle, overlap-add.

8 chPilanci
Lec 9

Distance Classification

NN classifier, Hilbert spaces, Parseval, template matching.

8 chPilanci
Lec 10

Wavelets

CWT, DWT filter bank, Haar, Daubechies, multi-resolution.

8 chPilanci
Lec 11

Wavelet Applications

Denoising, JPEG2000, wavelet families, Fourier vs wavelet.

8 chPilanci
Lec 12

Linear Systems

LTI, convolution theorem, eigenvectors, circulant matrices.

8 chPilanci
Lec 13

Cepstrum & MFCC

Log trick, mel scale, mel filter bank, MFCC pipeline.

8 chPilanci
Lec 14

Bayes Classifiers

Bayes risk, likelihood ratio, ROC curves, Neyman-Pearson.

8 chPilanci
Lec 15

LDA & QDA

Stationarity, autocorrelation, Gaussian classification, LDA/QDA.

8 chPilanci
Lec 16

Fisher Discriminant

J(w) criterion, scatter matrices, simultaneous diagonalization.

8 chPilanci
Lec 17

SVM

Maximum margin, hard/soft margin, slack variables, multi-class.

8 chPilanci
Lec 18

Convex Duality

Lagrangian, KKT, primal→dual derivation, dual SVM.

8 chPilanci
Lec 19

Kernels

Feature maps, kernel trick, RBF, Mercer, representer theorem.

8 chPilanci
Lec 20

Regression & AR

Least squares, ridge, LASSO, autoregressive models, Yule-Walker.

8 chPilanci
Lec 21

RKHS

Bayesian regression, kernel regression, Gaussian processes.

8 chPilanci
Lec 22

Adaptive Filters

Wiener filter, LMS algorithm, noise cancellation, echo cancel.

8 chPilanci
Lec 23

Neural Networks

Hidden layers, activations, backprop, universal approximation.

8 chPilanci
Lec 24

Deep Learning & CNNs

Convolutional layers, spectrograms+CNN, BatchNorm, ResNets.

8 chPilanci
Lec 25

Attention

QKV, scaled dot-product, multi-head, transformer blocks.

8 chPilanci
Lec 26

Transformers for Signals

Signal tokenization, causal masking, GPT-style prediction.

8 chPilanci
Lec 27

Diffusion Models

Forward/reverse process, noise prediction, WaveGrad.

8 chPilanci
Lec 28

Convex Neural Networks

Pilanci’s reformulation, group LASSO, double descent.

8 chPilanci
Lec 29

Autoencoders & RPCA

VAE, nuclear norm, Robust PCA, matrix separation.

8 chPilanci
Lec 30

NMF & Clustering

Multiplicative updates, source separation, deep MF.

8 chPilanci
Lec 31

Dictionary Learning

K-SVD, OMP, matching pursuit, LASSO, sparse coding.

8 chPilanci
Embedded Systems & IoT7
Model Optimization4
Systems & Hardware1
CS 229s — Systems for Machine Learning (9 Lectures)9

Stanford CS 229s Fall 2023. Hardware-aware algorithm design, transformer efficiency, FlashAttention, sparsity & quantization, finetuning, parallelism, efficient architectures, cluster scheduling.

PyTorch Deep Dive5
Tools & Frameworks1
Probability Distributions — 55 Distributions5
Estimation & Probabilistic Models10
NEW

Particle Filter

Estimation by a cloud of weighted guesses — for multimodal, nonlinear beliefs the Kalman filter can’t represent. Predict, weight, resample; Monte Carlo Localization; the curse of dimensionality.

10 chapters8+ sims
NEW

Factor Graphs

The modern SLAM back-end: optimize the whole trajectory + map at once as a sparse least-squares problem. Smoothing vs filtering, variables & factors, Gauss-Newton, the loop closure that snaps a drifted map straight, and bundle adjustment.

10 chapters9+ sims
NEW

Robust Estimation

One bad measurement wrecks least squares. The two great cures: down-weight outliers (M-estimators, Huber, IRLS) and vote them out (RANSAC). Influence functions, breakdown points, and the robust kernels behind every SLAM back-end and SfM pipeline.

10 chapters9+ sims
LESSON 14

Bayes Filter

The mother of all filters — recursive belief update from first principles.

9 chapters
LESSON 15

Kalman Filter

Track a moving object through noise — the most elegant algorithm in engineering.

11 chapters
LESSON 16

Extended Kalman Filter

When the world is nonlinear — linearize with Jacobians.

9 chapters
LESSON 17

Unscented Kalman Filter

Beyond linearization — sigma points capture nonlinearity directly.

9 chapters
LESSON 18

Hidden Markov Model

Sequences with hidden causes — Forward, Viterbi, Baum-Welch.

10 chapters
LESSON 19

Bayes Estimation

Prior × likelihood = posterior. The foundation of all inference.

9 chapters
LESSON 20

Bayesian Networks

Variables with dependencies — DAGs, d-separation, message passing.

10 chapters
Robotics & Perception5
Decision & Control3
Deep Reinforcement Learning15
43

Policy Gradients: The Complete Guide

Every symbol explained, every step derived. From REINFORCE to off-policy IS to KL constraints → PPO.

12 chaptersCS 224R
44

Actor-Critic Methods

Learn to estimate what’s good vs bad. MC, bootstrapping, N-step returns, PPO, SAC — the complete guide.

12 chaptersCS 224R
45

Off-Policy Actor-Critic: PPO & SAC

The practical algorithms. Clipping, GAE, replay buffers, Q-functions — from theory to real robots.

11 chaptersCS 224R
47

Offline RL: Learning from Fixed Data

Train policies without environment interaction. AWR, AWAC, IQL — data stitching, expectile regression, and implicit policy improvement.

10 chaptersCS 224R
48

Reward Learning

Where do rewards come from? Goal classifiers, adversarial training, human preferences, Bradley-Terry, RLHF, and Constitutional AI.

10 chaptersCS 224R
46

Q-Learning & DQN

No policy needed. Bellman optimality, target networks, double Q-learning, N-step returns — value-based RL.

11 chaptersCS 224R
47

Imitation Learning

Behavioral cloning, expressive policies, diffusion policies, compounding errors, DAgger — learning from demos.

11 chaptersCS 224R
48

Deep RL: The Complete Theory

Every derivation, every proof. MDPs → Policy Gradients → TRPO → PPO → SAC → DQN → Offline RL → DPO.

12 chaptersCS 224R · CS 234 · CS 285
49

Model-Based RL

Learn a simulator of the world, then practice inside it. Dyna, MBPO, MPC, CEM, value-augmented planning, AlphaGo as MBRL.

10 chaptersCS 224R
50

Multi-Task & Goal-Conditioned RL

One policy, many tasks. Task conditioning, goal-conditioned rewards, HER — turning failures into free training data.

10 chaptersCS 224R
51

CS 224R Exam Review

Complete midterm prep: every algorithm, every equation, 36+ practice questions, interactive exam simulator, cheat sheet.

12 chaptersCS 224R
52

RL for LLM Reasoning

Noam Brown’s approach: search, self-play, and RL for superhuman reasoning in language models.

CS 224R
53

RLHF & DPO

The post-training frontier: reward models, PPO alignment, direct preference optimization, and beyond.

CS 224R
54

Meta-Reinforcement Learning

Learning to learn new tasks from a handful of episodes. Black-box architectures, exploration strategies, task inference as POMDP.

12 chaptersCS 224R
55

Hierarchy in IL & RL

Decompose long-horizon tasks into subtask hierarchies. HL/LL policies, goal representations, DAgger for hierarchy, HIRO, skill discovery.

12 chaptersDeep RL
cs224n — NLP with Deep Learning19
Lec 1

History of Language AI

Four eras of NLP: rule-based translation, hand-built AI, statistical methods, neural revolution. Understanding vs pattern matching.

10 chaptersCS224N
Lec 2

Word Vectors

One-hot limitations, distributional hypothesis, Word2Vec (CBOW & Skip-gram), GloVe, negative sampling, word analogies, embedding bias.

10 chaptersCS224N
Lec 3

Backpropagation & Neural Nets

Neurons, activations, forward pass, loss functions, chain rule, backprop algorithm, gradient descent playground, practical tips.

10 chaptersCS224N
Lec 4

Language Models & RNNs

N-grams, neural LMs, recurrent networks, BPTT, vanishing gradients, LSTM & GRU, live text generation, attention preview.

10 chaptersCS224N
Lec 5

The Transformer

Self-attention, scaled dot-product, multi-head attention, positional encoding, encoder-decoder, build a Transformer piece by piece.

10 chaptersCS224N
Lec 6

Practical Methodology

Debugging, hyperparameter tuning, regularization, training diagnostics dashboard, evaluation & error analysis.

8 chaptersCS224N
Lec 7

Pre-training at Scale

BERT vs GPT paradigms, masked LM, autoregressive LM, data pipelines, scaling laws, Chinchilla, Llama 3 architecture.

10 chaptersCS224N
Lec 8

Post-training: RLHF & DPO

SFT, reward modeling, PPO with KL constraint, DPO, alignment data, evaluation, safety guardrails.

10 chaptersCS224N
Lec 9

Efficient Adaptation

In-context learning, prompt engineering, chain-of-thought, lottery tickets, adapters, LoRA, PEFT playground.

10 chaptersCS224N
Lec 10

Agents, Tools & RAG

Retrieval-augmented generation, dense retrieval, ReAct reasoning loops, Toolformer, agent simulator.

10 chaptersCS224N
Lec 11

Benchmarking & Evaluation

MMLU, HELM, LLM-as-judge, benchmark explorer dashboard, data contamination, metric saturation.

8 chaptersCS224N
Lec 12

Reasoning Part 1

Chain-of-thought, self-consistency, zero-shot CoT, process rewards, DeepSeek-R1, GRPO & DAPO.

10 chaptersCS224N
Lec 13

Reasoning Part 2

Process reward models, best-of-N, speculative decoding, RoPE, context extension, test-time compute scaling.

10 chaptersCS224N
Lec 14

Tokenization & Multilinguality

BPE, WordPiece, SentencePiece, multilingual fertility, XLM-R cross-lingual transfer, tokenizer explorer.

8 chaptersCS224N
Lec 15

Interpretability

Linear probes, attention visualization, sparse autoencoders, agentic interpretability, concept discovery.

8 chaptersCS224N
Lec 16

Social & Broader Impacts

Bias, toxicity, misinformation, privacy, environmental cost, governance & policy.

8 chaptersCS224N
Lec 17

Multimodality

Vision encoders, early/late fusion, Chameleon, Transfusion, scaling mixed-modal, retrieval-augmented multimodal.

10 chaptersCS224N
Lec 18

LoRA Without Regret

When LoRA fails, regularization in PEFT, merging adapters, QLoRA, DoRA, future of adaptation.

8 chaptersCS224N
Lec 19

Open Questions in NLP 2026

Reasoning, grounding, efficiency, safety, multimodal frontier, interactive research map.

8 chaptersCS224N
cs224w — Machine Learning with Graphs19
Lec 1

Introduction to Graph ML

Why graphs? Node/edge/graph-level tasks. AlphaFold, PinSage, drug interactions, antibiotic discovery.

10 chaptersCS224W
Lec 2

Node Embeddings

Encoder-decoder framework, DeepWalk, node2vec (biased walks with p,q), negative sampling, matrix factorization view.

10 chaptersCS224W
Lec 3

Graph Neural Networks

Message passing, computation graphs, GCN (mean aggregation), GNNs generalize CNNs, transformers as GNNs.

10 chaptersCS224W
Lec 4

GNN Design Space

Message + aggregation framework. GCN vs GraphSAGE vs GAT. Multi-head attention. Skip connections, over-smoothing.

10 chaptersCS224W
Lec 5

GNN Augmentation & Training

Feature/structure augmentation, virtual nodes, neighbor sampling, prediction heads, loss functions, dataset splitting.

10 chaptersCS224W
Lec 6

Theory of GNNs

How powerful are GNNs? WL test, multiset injectivity, why GCN/GraphSAGE fail, GIN achieves maximum expressiveness.

10 chaptersCS224W
Lec 7

Powerful Graph Encoders

Beyond WL: Laplacian eigenvectors, structural features, position-aware GNNs, anchor sets, higher-order k-WL.

10 chaptersCS224W
Lec 8

Graph Transformers

Self-attention on graphs, positional encodings (Laplacian, random walk), Graphormer, GPS hybrid architecture.

10 chaptersCS224W
Lec 9

Heterogeneous Graphs

Multiple node/edge types. RGCN (relation-specific weights), basis decomposition, HGT, metapaths.

10 chaptersCS224W
Lec 10

Knowledge Graphs

TransE, TransR, DistMult, ComplEx, RotatE. Relation patterns: symmetry, composition, 1-to-N. KG completion.

10 chaptersCS224W
Lec 11

GNNs for RecSys

Bipartite graphs, NGCF, LightGCN, BPR loss, PinSage scalability, cold start, GNN vs LLM.

10 chaptersCS224W
Lec 12

Relational Deep Learning

Databases ARE graphs. Foreign keys as edges, GNNs on relational data, RelBench, vs XGBoost.

10 chaptersCS224W
Lec 13

Advanced RDL

Temporal message passing, multi-hop aggregation, Griffin universal encoder, schema-specific vs universal.

10 chaptersCS224W
Lec 14

Advanced GNN Topics

Graph foundation models, pre-training, explainability, equivariant GNNs, dynamic graphs, practical tips.

10 chaptersCS224W
Lec 15

KG Foundation Models

Inductive KG reasoning, ULTRA, cross-schema transfer, foundation models for knowledge graphs.

10 chaptersCS224W
Lec 16

LLM + GNN

LLMs as feature encoders, GNNs as structure encoders, joint pipelines, graph-augmented LLMs.

10 chaptersCS224W
Lec 17

Agents + Graphs

ReAct, Reflexion, graph-grounded agents traversing KGs step by step. Tool use, PPO/DPO optimization, STaRK benchmarks.

10 chaptersCS224W
Lec 18

Deep Generative Models for Graphs

GraphRNN two-RNN sequential generation, GCPN RL-guided design, JT-VAE motifs, diffusion on graphs, molecular benchmarks.

10 chaptersCS224W
Lec 19

Conclusion & The Road Ahead

Full arc from node embeddings to graph foundation models. Open problems, research frontiers, practical advice for GNN practitioners.

8 chaptersCS224W
cs231n — Deep Learning for Computer Vision28
28

Image Classification

k-NN, L1/L2 distance, hyperparameter tuning, cross-validation.

10 chapters
29

Linear Classification

Score function Wx+b, hinge loss, cross-entropy, regularization.

10 chapters
30

Loss Functions

MSE, cross-entropy, KL divergence, Huber, contrastive, triplet, InfoNCE — every loss derived from scratch with interactive comparisons.

10 chapters12+ sims
30

Optimization & Backprop

Gradients, SGD, chain rule, computation graphs, gate patterns.

10 chapters
31

Neural Networks: Architecture

Neurons, activation functions, layers, representational power.

10 chapters
32

Neural Networks: Setup

Data preprocessing, weight init, batch norm, dropout.

10 chapters
33

Neural Networks: Training

Loss curves, learning rates, momentum, Adam, hyperparameter search.

10 chapters
34

NN Case Study: Spirals

Build a 2-layer net from scratch. Live training visualization.

10 chapters
35

Convolutional Networks

Convolution, filters, pooling, LeNet to ResNet.

10 chapters
36

Transfer Learning

Feature extraction vs fine-tuning, freezing layers.

9 chapters
37

Recurrent Networks

RNNs, LSTM, BPTT, vanishing gradients, char-level LM.

9 chapters
38

Robot Learning

Perception-action loop, model-based planning, imitation learning, diffusion policies, VLAs & foundation models.

10 chapters
39

Multi-Modal Foundation Models

CLIP, LLaVA, Flamingo, Molmo, SAM — contrastive learning, VLMs, promptable segmentation, model chaining.

10 chapters
40

3D Vision

Point clouds, meshes, SDFs, PointNet, NeRF, 3D Gaussian Splatting — from representations to neural rendering.

10 chapters
38

Regularization & Optimization

L1/L2, cross-entropy, SGD, momentum, Adam, learning rate schedules, weight initialization.

10 chapters
39

Neural Networks & Backprop

Computational graphs, chain rule, gate patterns, Jacobians — backprop from first principles.

10 chapters
40

CNNs for Classification

Convolution operation, stride, padding, pooling, receptive field, depthwise separable convolutions.

10 chapters
41

CNN Training & Architectures

Data augmentation, batch norm, dropout, AlexNet → VGG → ResNet → EfficientNet → ConvNeXt.

10 chapters
42

Recurrent Networks (2025)

Vanilla RNN, BPTT, vanishing gradients, LSTM gates, GRU, char-level LM, image captioning.

10 chapters
39

Attention & Transformers

Bahdanau attention, self-attention, multi-head, transformer blocks, positional encoding, ViT.

10 chapters
40

Detection & Segmentation

FCN, U-Net, R-CNN family, YOLO, Mask R-CNN, GradCAM, adversarial examples.

10 chapters
41

Video Understanding

Two-stream networks, 3D convolutions, I3D, SlowFast, TSM, video transformers, VideoMAE.

10 chapters
42

Action Recognition

Single-frame to SlowFast to video transformers. Two-stream, 3D conv, skeleton-based, temporal detection.

10 chapters
43

Optical Flow

Brightness constancy, Lucas-Kanade, Horn-Schunck, FlowNet, RAFT. Dense motion estimation from zero.

10 chapters
44

Distributed Training

GPU hardware, data/pipeline/tensor parallelism, FSDP, ring all-reduce, 3D parallelism — training at scale.

10 chapters
43

Self-Supervised Learning

Pretext tasks, MAE, InfoNCE, SimCLR, MoCo, CPC, DINO — learning without labels.

10 chapters
44

Generative Models I

Density functions, MLE, autoregressive models, PixelCNN, autoencoders, VAEs — the full ELBO derivation.

10 chapters
45

Generative Models II

GANs, rectified flow, classifier-free guidance, latent diffusion, DiT, text-to-image & video generation.

10 chapters
System Design1

Research-backed system design lessons. Real architectures, real numbers, real tradeoffs — with interactive Canvas simulations that trace requests, visualize scale, and simulate failures.

Distributed Systems16

Practical distributed systems engineering — networking, consensus, CRDTs, transactions, caching, load balancing, resiliency patterns, observability, and deployment. Interview-grade depth with interactive simulations.

Part I

Network Foundations

TCP, TLS, flow control, congestion control, QUIC — the networking layer.

10 chapters8 sims
Part I

Service Communication

DNS, REST APIs, gRPC, idempotency — how services find and talk to each other.

10 chapters8 sims
Part II

Failure & Time

Failure detection, physical/logical/vector clocks, HLC — time in distributed systems.

10 chapters9 sims
Part II

Consensus & Replication

Raft, state machine replication, consistency models, chain replication.

10 chapters10 sims
Part II

Coordination Avoidance

CRDTs, gossip protocols, Dynamo, CALM theorem, causal consistency.

10 chapters10 sims
Part II

Distributed Transactions

ACID, isolation levels, 2PC, sagas, outbox pattern.

10 chapters10 sims
Part III

Caching & CDNs

HTTP caching, reverse proxies, CDN architecture, cache invalidation.

10 chapters10 sims
Part III

Partitioning & Storage

Range/hash partitioning, consistent hashing, blob storage.

10 chapters10 sims
Part III

Load Balancing

DNS, L4, L7 load balancing — distributing traffic across servers.

10 chapters9 sims
Part III

Data Storage & Caching

Replication, NoSQL taxonomy, caching patterns, eviction policies.

10 chapters9 sims
Part III

Service Architecture

Microservices, API gateways, service mesh, control/data planes.

10 chapters9 sims
Part III

Messaging & Events

Message queues, pub/sub, Kafka, exactly-once, backpressure.

10 chapters9 sims
Part IV

Failure Modes & Isolation

Cascading failures, redundancy, shuffle sharding, cellular architecture.

10 chapters9 sims
Part IV

Resiliency Patterns

Timeouts, retries, circuit breakers, rate limiting, load shedding.

10 chapters9 sims
Part V

Testing & Deployment

Test pyramid, chaos engineering, CI/CD, canary deploys, rollbacks.

10 chapters9 sims
Part V

Observability & Operations

Metrics, SLIs/SLOs, alerting, logs, distributed tracing, dashboards.

10 chapters9 sims
Seminal Papers2

Deep interactive lessons from foundational distributed systems papers. Every definition derived, every algorithm implemented, every proof traced.

Introduction to Algorithms32

Chapter-by-chapter deep dive into the classic algorithms textbook. Each chapter is a standalone interactive lesson with interview-grade depth, Canvas simulations, coding drills, and mastery challenges.

Chapter 2

Getting Started

Insertion sort, merge sort, algorithm analysis — the foundations of algorithmic thinking.

10 chapters10 sims
Chapter 3

Growth of Functions

Big-O, Omega, Theta — the language of algorithm efficiency.

10 chapters10 sims
Chapter 4

Divide & Conquer

Max subarray, Strassen, Master theorem — breaking problems into pieces.

10 chapters10 sims
Chapter 6

Heapsort & Priority Queues

Binary heaps, heapsort, priority queues — scheduling and selection.

10 chapters10 sims
Chapter 7

Quicksort

Partition, randomize, conquer — the fastest practical sorting algorithm.

10 chapters9 sims
Chapter 8

Sorting in Linear Time

Counting sort, radix sort, bucket sort — breaking the O(n log n) barrier.

10 chapters8 sims
Chapter 11

Hash Tables

Hashing, chaining, open addressing, perfect hashing — O(1) lookup.

10 chapters10 sims
Chapter 12

Binary Search Trees

Search, insert, delete, traverse — ordered data made fast.

10 chapters9 sims
Chapter 13

Red-Black Trees

Self-balancing BSTs — guaranteed O(log n) with coloring rules.

10 chapters9 sims
Chapter 15

Dynamic Programming

Rod cutting, LCS, edit distance, knapsack — the paradigm that dominates interviews.

11 chapters9 sims
Chapter 16

Greedy Algorithms

Activity selection, Huffman, fractional knapsack — locally optimal = globally optimal.

10 chapters9 sims
Chapter 17

Amortized Analysis

Aggregate, accounting, potential — the true cost of operation sequences.

10 chapters9 sims
Chapter 22

Graph Algorithms

BFS, DFS, topological sort, SCCs — the foundation of every graph question.

11 chapters9 sims
Chapter 23

Minimum Spanning Trees

Kruskal, Prim, Union-Find — connecting everything at minimum cost.

10 chapters9 sims
Chapter 24

Shortest Paths

Dijkstra, Bellman-Ford, DAG shortest paths — optimal routes through graphs.

10 chapters11 sims
Chapter 26

Maximum Flow

Ford-Fulkerson, max-flow min-cut, bipartite matching — optimizing network flow.

10 chapters9 sims
Chapter 9

Medians & Order Statistics

Quickselect, median of medians — finding the k-th element in linear time.

10 chapters9 sims
Chapter 18

B-Trees

Disk-optimized search trees — billions of keys with just 2-3 disk reads.

10 chapters8 sims
Chapter 19

Fibonacci Heaps

Amortized O(1) decrease-key — the theoretical champion for graph algorithms.

10 chapters9 sims
Chapter 20

van Emde Boas Trees

O(log log u) predecessor queries — when the universe is your friend.

10 chapters8 sims
Chapter 21

Disjoint Sets

Union-Find — nearly O(1) connected component queries with path compression.

10 chapters10 sims
Chapter 25

All-Pairs Shortest Paths

Floyd-Warshall, Johnson's — shortest paths between every pair of vertices.

10 chapters9 sims
Chapter 27

Multithreaded Algorithms

Fork-join, work/span, parallel merge sort — harnessing multiple cores.

10 chapters8 sims
Chapter 28

Matrix Operations

LU decomposition, least squares, Cholesky — the linear algebra engine.

10 chapters9 sims
Chapter 29

Linear Programming

Simplex, duality, LP reductions — the universal optimization framework.

10 chapters8 sims
Chapter 30

Polynomials & the FFT

DFT, FFT, convolution — O(n log n) polynomial multiplication.

10 chapters12 sims
Chapter 31

Number-Theoretic Algorithms

GCD, modular arithmetic, RSA, primality — the math behind cryptography.

10 chapters8 sims
Chapter 32

String Matching

KMP, Rabin-Karp, finite automata — finding patterns in text efficiently.

10 chapters8 sims
Chapter 33

Computational Geometry

Convex hull, closest pair, line intersection — algorithms in 2D space.

10 chapters8 sims
Chapter 34

NP-Completeness

P vs NP, reductions, SAT, TSP — the limits of efficient computation.

11 chapters11 sims
Chapter 35

Approximation Algorithms

Vertex cover, TSP, set cover — good-enough solutions with guarantees.

10 chapters9 sims
Capstone

Algorithms in Modern CS

Databases, ML, compilers, networking, crypto, graphics — where every CLRS chapter shows up.

12 chapters11 sims
Designing Data-Intensive Applications13

Chapter-by-chapter deep dive into Martin Kleppmann's DDIA. Each chapter is a standalone interactive lesson with interview-grade depth, Canvas simulations, design challenges, debug scenarios, and mastery components.

Chapter 1

Trade-Offs in Data Systems

Reliability, scalability, maintainability — the vocabulary of systems design interviews.

11 chapters9 sims
Chapter 2

Nonfunctional Requirements

SLAs, percentiles, capacity planning, load testing — quantifying system quality.

9 chapters7 sims
Chapter 3

Data Models & Query Languages

Relational, document, graph — choosing how to structure and query your data.

11 chapters10 sims
Chapter 4

Storage & Retrieval

B-trees, LSM-trees, column stores — how databases actually store and find your data.

11 chapters9 sims
Chapter 5

Encoding & Evolution

JSON, Protobuf, Avro, schema evolution — how data survives change.

11 chapters9 sims
Chapter 6

Database Replication

Leader-follower, multi-leader, leaderless — keeping copies consistent across machines.

11 chapters9 sims
Chapter 7

Data Sharding

Hash partitioning, range partitioning, rebalancing — splitting data across machines.

11 chapters9 sims
Chapter 8

Database Transactions

ACID, isolation levels, MVCC, serializability — the guarantees that keep your data correct.

11 chapters11 sims
Chapter 9

The Trouble with Distributed Systems

Network faults, clock drift, process pauses — everything that can go wrong, will.

11 chapters9 sims
Chapter 10

Consistency & Consensus

Linearizability, Raft, Paxos — how distributed nodes agree on the truth.

11 chapters11 sims
Chapter 11

Batch Processing

MapReduce, Spark, dataflow engines — processing massive datasets efficiently.

11 chapters10 sims
Chapter 12

Stream Processing

Kafka, Flink, event sourcing, windowing — processing data as it arrives.

11 chapters9 sims
Chapters 13-14

Philosophy & Ethics

Streaming philosophy, bias, privacy, responsibility — the human side of data systems.

8 chapters6 sims
Exercises & Workbooks25
WORKBOOK

Scaling Book Workbook: Interactive Exercises

54 hands-on exercises covering scaling laws, chinchilla, compute-optimal training, loss prediction, data mixing, emergent abilities.

12 chapters54 exercises
WORKBOOK

Transformer Math Workbook

52 exercises across 5 modes: derive, trace, build, design, debug. Parameter counts, attention FLOPs, memory budgets, KV caches, throughput estimation.

10 chapters52 exercises5 exercise types
WORKBOOK

Training & Backprop Workbook

Chain rule by hand, gradient shapes, cross-entropy, Adam optimizer, batch norm, learning rate schedules, mixed precision, training diagnostics.

10 chapters54 exercises
WORKBOOK

Probability & Bayes Workbook

Bayes rule, distributions, MLE, MAP estimation, Kalman filter math, HMM forward/Viterbi, information theory, sensor fusion.

10 chapters57 exercises
WORKBOOK

RL Fundamentals Workbook

Bellman equations, value iteration, Q-learning updates, policy gradients, advantage estimation, PPO clipping, exploration strategies.

10 chapters56 exercises
WORKBOOK

Diffusion & Flow Matching Workbook

Forward process, noise schedules, DDPM loss, score functions, sampling, classifier-free guidance, latent diffusion, flow matching, ODE/SDE.

10 chapters55 exercises
WORKBOOK

Systems & Serving Workbook

Tensor/pipeline/data parallelism, continuous batching, PagedAttention, speculative decoding, quantization, cost optimization.

10 chapters58 exercises
WORKBOOK

Robotics Math Workbook

Rotation matrices, homogeneous transforms, EKF predict/update, SLAM graphs, inverse kinematics, Jacobians, PID control.

10 chapters56 exercises
WORKBOOK

Code from Scratch Workbook

Implement softmax, linear layers, attention, layer norm, positional encoding, BPE tokenizer, cross-entropy, Adam, KV cache, sampling.

10 chapters52 exercises
WORKBOOK

CLRS Algorithms Workbook

Asymptotic analysis, recurrences, sorting, hash tables, BSTs, dynamic programming, greedy, graph algorithms, shortest paths, MST.

10 chapters56 exercises
WORKBOOK

State Estimation (Advanced) Workbook

Multi-dimensional KF, EKF Jacobians, range-bearing updates, UKF sigma points, particle filters, sensor fusion, observability, noise tuning.

10 chapters56 exercises
WORKBOOK

SLAM & Navigation Workbook

EKF-SLAM, landmark observation, data association, loop closure, graph SLAM, visual odometry, RANSAC, factor graphs, occupancy grids.

10 chapters58 exercises
WORKBOOK

Graph Neural Networks Workbook

Graph basics, node embeddings, GCN/SAGE/GAT, over-smoothing, spectral theory, knowledge graphs, link prediction, graph generation.

10 chapters56 exercises
WORKBOOK

NLP Fundamentals Workbook

Word vectors, Word2Vec gradients, dependency parsing, perplexity, attention, subword tokenization, pretraining, PEFT, evaluation metrics.

10 chapters57 exercises
WORKBOOK

Signal Processing Workbook

Discrete signals, DFT by hand, quantization, Lloyd-Max, STFT, wavelets, MFCC pipeline, Bayes classifiers, SVM margins.

10 chapters60 exercises
WORKBOOK

Distributed Training Workbook

Ring AllReduce, ZeRO stages, gradient accumulation, LR scaling, pipeline parallelism, tensor parallelism, checkpointing, mixed precision.

10 chapters58 exercises
WORKBOOK

Computer Vision Workbook

Convolution math, receptive fields, pooling, anchors/IoU, NMS, ResNet, segmentation, stereo vision, homography.

10 chapters53 exercises
WORKBOOK

Linear Algebra for ML Workbook

Matrix operations, Gaussian elimination, eigenvalues, SVD, PCA, matrix calculus, least squares, norms, positive definiteness.

10 chapters59 exercises
WORKBOOK

Numerical Methods Workbook

Floating point, catastrophic cancellation, condition numbers, iterative methods, Newton's method, gradient descent, integration, sparse matrices.

10 chapters55 exercises
WORKBOOK

Information Theory Workbook

Entropy, joint/conditional entropy, KL divergence, cross-entropy loss, source coding, channel capacity, rate-distortion, VAE connection.

10 chapters59 exercises
WORKBOOK

Optimization Theory Workbook

Convexity, gradient descent, SGD variants, constrained optimization, KKT conditions, duality, proximal operators, convergence rates, Newton's method.

10 chapters57 exercises
WORKBOOK

RL Theory (CS224R) Workbook

Reward modeling, RLHF objective, PPO for RLHF, DPO loss, CQL, IQL, model-based RL, reward shaping, multi-agent RL.

10 chapters57 exercises
WORKBOOK

Deep RL Exam Prep Workbook

Mode collapse, flow matching arithmetic, REINFORCE credit assignment, actor-critic n-step returns, Q-learning TD targets, offline RL, IQL expectile loss, goal-conditioned RL.

10 chapters50 exercises
WORKBOOK

Decision Making Under Uncertainty Workbook

Bayesian networks, Dirichlet inference, value of information, Bellman backups, kernel smoothing, MCTS/UCB1, policy gradients, POMDPs, alpha vectors.

10 chapters52 exercises
WORKBOOK

Model Compression Workbook

Quantization (PTQ, QAT, STE), pruning (magnitude, structured, lottery ticket), knowledge distillation, LoRA, mixed precision, compression pipelines.

10 chapters58 exercises
Day In The Life of an Engineer15

Deep-dive lessons built around real engineering roles. Classical foundations, modern tools, real papers, interactive sims. Built for interview prep with real depth.

Role Guide

Forward Deployed Engineer

Customer discovery, rapid prototyping, production deployment at customer sites, SDK integration, debugging in customer environments, demo engineering, incident response.

13 chapters12+ sims
Role Guide

Applied AI Engineer

RAG architecture, agent design, fine-tuning, prompt engineering, evaluation, streaming, production AI infrastructure, safety & guardrails.

13 chapters13+ sims
Role Guide

Web-Scale AI & Search Engineer

Information retrieval, dense embeddings, learning to rank, recommendation systems, serving a web index, the convergence of search + recs + transformers.

13 chapters12+ sims
Role Guide

Backend & API Engineer at Scale

API design, request lifecycle, database optimization, caching, rate limiting, auth, observability, scaling patterns, developer experience.

13 chapters12+ sims
Role Guide

Infrastructure & LLM Scaling Engineer

GPU clusters, LLM serving, distributed systems, cost optimization, reliability engineering, CI/CD, monitoring — keeping AI systems running at scale.

13 chapters12+ sims
Role Guide

Developer Relations Engineer

Docs, SDKs, community, content strategy, conference talks, launches, feedback loops, measuring impact — with AI company DevRel woven throughout.

13 chapters13 sims
Role Guide

AI/GenAI Scrum Master

Experiment-driven sprints, eval-based acceptance criteria, data pipeline management, research-to-production handoffs, agentic AI project management.

13 chapters13 sims
Role Guide

Lead QE & Validation Engineer

DVA model testing, sim-to-real validation, safety compliance, CI/CD for robotics, 20 interview questions + cheat sheet.

12 chapters20 Q&A
Role Guide

3D & Multi-View Geometry Engineer

Camera models, epipolar geometry, SfM, bundle adjustment, SLAM, dense reconstruction, point tracking, DUSt3R/VGGT — the full stack for robotics perception.

13 chapters8+ sims
Role Guide

ML Performance Optimization Engineer

GPU architecture, profiling, mixed precision, distributed training, model compression, TensorRT, serving at scale, real-time autonomous driving stacks.

12 chapters11 sims
Role Guide

Robotics Simulation & Software Engineer

Physics simulation, contact models, MuJoCo/Isaac Sim, kinematics, motion planning, control, sim-to-real, RL, ROS2 — the full robotics stack.

13 chapters13 sims
Role Guide

ML Inference & Performance Engineer

Quantization, CUDA kernels, TensorRT, FlashAttention, KV-cache, distributed training, BEV perception, edge deployment, VLA — the full AV inference stack.

17 chapters16+ sims
Role Guide

Software Testing & Reliability

Test design, automation, reliability engineering, incident response, safety-critical testing, debugging, observability — onsite interview prep for robotics QE.

16 chapters16 sims
Role Guide

Agentic Engineer at Sierra

Agent SDK, RAG, evaluation, data pipelines, inference serving, monitoring, security, distributed systems — the full agentic AI platform stack.

20 chapters17+ sims
Role Guide

Frontier Lab Engineer: Day in the Life

Pre-training, RLHF, evals, safety, scaling laws, infra, on-call, paper reading, experiment tracking — what a frontier lab researcher actually does.

17 chapters16+ sims
AI Harness Engineering12

The practical toolkit for building with AI models. Embeddings, retrieval, RAG, agents, evals, safety — everything between "I have a model" and "I have a production app."

HARNESS 01

Vector Embeddings

What embeddings are, how they encode meaning, distance in high-D space, embedding models, multimodal embeddings.

10 chapters10 sims
HARNESS 02

Similarity Metrics

Cosine, dot product, Euclidean, Jaccard, learned metrics — measuring "how alike" in vector space.

9 chapters9 sims
HARNESS 03

Vector Databases

FAISS, HNSW, IVF, product quantization — storing and searching millions of vectors in milliseconds.

10 chapters10 sims
HARNESS 04

Text Chunking Strategies

Fixed, sentence, recursive, semantic, structure-aware — how to split documents for embedding and retrieval.

9 chapters9 sims
HARNESS 05

RAG

The full pipeline: chunk → embed → store → retrieve → rerank → generate. Eval, failure modes, advanced patterns.

11 chapters11 sims
HARNESS 06

Multimodal RAG

Images, tables, PDFs in RAG. Vision embeddings, ColPali, document parsing, hybrid retrieval.

9 chapters9 sims
HARNESS 07

MCP (Model Context Protocol)

USB-C for AI: tools, resources, prompts. Build MCP servers and clients. The standard for AI integrations.

10 chapters10 sims
HARNESS 08

Prompt Engineering

System prompts, few-shot, chain-of-thought, structured output, testing — systematic, not vibes-based.

10 chapters10 sims
HARNESS 09

Evaluation & Evals

LLM-as-judge, human eval, automated metrics, regression detection, A/B testing for AI systems.

10 chapters10 sims
HARNESS 10

AI Safety & Guardrails

Content filtering, jailbreak prevention, PII detection, output validation, red teaming, defense in depth.

10 chapters10 sims
HARNESS 11

Fine-Tuning in Practice

When to fine-tune vs prompt, data prep, LoRA/QLoRA, eval pipelines, deployment, cost analysis.

10 chapters10 sims
HARNESS 12

Agents & Tool Use

Function calling, ReAct, multi-step agents, state management, error recovery, guardrails for agents.

10 chapters10 sims
Teaching Sessions1

Record yourself teaching on a whiteboard with optional camera. Export as MP4. The ultimate Feynman test: if you can teach it, you understand it.