Transformer
Self-attention, KV caching, MoE, LoRA, RLHF, speculative decoding — the architecture that powers GPT, Claude, and Gemini.
A bird's-eye map of every major AI architecture — what it is, how it works, how it's trained, and how it's used. Interactive diagrams, intuitive explanations, and the research context you need to navigate the field. Not a textbook. Not a tutorial. A compass.
Self-attention, KV caching, MoE, LoRA, RLHF, speculative decoding — the architecture that powers GPT, Claude, and Gemini.
Selective scan, linear recurrence, and hybrid SSM-attention architectures like Jamba and Zamba. The O(n) alternative.
DDPM, DDIM, latent diffusion, classifier-free guidance, ControlNet, consistency models — the dominant generative paradigm.
Continuous normalizing flows, optimal transport, conditional flow matching — straighter paths, fewer steps. Behind SD3 and Flux.
Variational inference, ELBO, codebook learning, FSQ — the secret plumbing behind every generative system.
Adversarial training, StyleGAN, spectral normalization, PatchGAN — the OG generative model, still alive at the edges.
CLIP, SimCLR, DINO, MAE, SigLIP — the glue that makes multimodal AI possible.
Visual encoder + LLM fusion, instruction tuning, grounding, OCR-free document understanding — GPT-4V and beyond.
Action heads, diffusion policies, action chunking, cross-embodiment training — foundation models that physically act.
JEPA, Dreamer, Genie, UniSim — learned dynamics, latent imagination, and video prediction as world simulation.
Neural radiance fields, 3DGS, generative 3D, SDF representations — 3D understanding from 2D images.
RLHF, DPO, KTO, Constitutional AI, process reward models — making AI systems do what we actually want.