← Engineermaxxing
Research Papers, Rebuilt

Veanors

Research papers rebuilt as interactive experiences. Every equation derived, every method visualized, every result interactive.

150
Papers
1452+
Chapters
600+
Simulations
No matches found.
Physical Intelligence — VLA Research14
2024

pi-0: VLA Flow Model for General Robot Control

The foundation model — flow matching for continuous robot actions across 7 robot types.

10 chaptersPhysical Intelligence
2025

pi-0.5: Open-World Generalization

First VLA to clean kitchens in new homes via heterogeneous co-training.

11 chaptersPhysical Intelligence
2025

Knowledge Insulating VLAs

Train fast, run fast, generalize better — decoupling VLM knowledge from action learning.

9 chaptersPhysical Intelligence
2025

pi*0.6: Learning From Experience

The first VLA that improves from its own real-world experience via RL.

9 chaptersPhysical Intelligence
2025

Helix: Training-Time Action Conditioning

Efficient real-time chunking via training-time action conditioning.

8 chaptersPhysical Intelligence
2025

Real-Time Chunking for Flow Policies

Making action chunking flow policies run in real-time.

8 chaptersPhysical Intelligence
2025

FAST: Efficient Action Tokenization

5-10x fewer action tokens via DCT-based compression.

9 chaptersPhysical Intelligence
2025

MEM: Multi-Scale Embodied Memory

Giving VLAs memory across multiple time scales for long-horizon tasks.

9 chaptersPhysical Intelligence
2025

Human to Robot Transfer

How internet video of humans performing tasks transfers to robot control.

9 chaptersPhysical Intelligence
2025

RL Token: Bootstrapping RL with VLAs

Fine-tuning VLAs with RL by treating reward as a special token.

8 chaptersPhysical Intelligence
2025

π0.7: Steerable Generalist Foundation Model

Diversified prompts unlock emergent dexterity, cross-embodiment transfer, and compositional generalization.

10 chaptersPhysical Intelligence
2604.19728 · 2026

VLA Foundry: Unified LLM→VLM→VLA Training

TRI’s open-source framework unifying the full pipeline. Qwen3-VL backbone beats LBM by 23pp on bimanual manipulation.

10 chaptersMercat, Keh et al. (TRI)
2510.13054 · 2025

VLA-0: Zero-Modification VLAs

Predict robot actions as plain text — no new tokens, no action heads, no architectural changes. Surprisingly SOTA on LIBERO.

10 chaptersGoyal et al. (NVIDIA)
2602.04215 · 2026

OAT: Ordered Action Tokenization

Three desiderata for action tokens: compression, decodability, ordering. Nested dropout + FSQ = anytime coarse-to-fine robot actions.

10 chaptersLiu, Han et al. (Harvard + Stanford)
Thinking Machines Lab — Foundation Techniques2
Ilya 30u30 — Foundational Reading List16
2004

Minimum Description Length

Learning as data compression. Kolmogorov complexity, crude vs refined MDL.

10 chGrünwald
2016

Order Matters: Seq2Seq for Sets

Read-Process-Write for set inputs, ordering effects on chain rule.

10 chVinyals et al.
2019

GPipe: Pipeline Parallelism

Micro-batch splitting, bubble time analysis, scaling to 557M params.

10 chHuang et al.
2016

Deep Residual Learning (ResNet)

Skip connections, degradation problem, 152-layer networks.

10 chHe et al.
2016

Dilated Convolutions

Exponential receptive fields without pooling for dense prediction.

10 chYu & Koltun
2017

Neural Message Passing

MPNN unifying framework for graph neural networks.

10 chGilmer et al.
2017

Attention Is All You Need

THE Transformer paper — self-attention, multi-head, positional encoding.

11 chVaswani et al.
2015

Bahdanau Attention

Original attention mechanism for neural machine translation.

10 chBahdanau et al.
2016

Identity Mappings (ResNet v2)

Pre-activation BN-ReLU, 1001-layer networks.

10 chHe et al.
2017

Relation Networks

Pairwise relational reasoning for visual QA.

10 chSantoro et al.
2017

Variational Lossy Autoencoder

VAE posterior collapse, bits-back coding, information flow control.

10 chChen et al.
2018

Relational Recurrent Neural Networks

Attention over memory slots, memory-memory interactions.

10 chSantoro et al.
2014

Coffee Automaton Complexity

Rise and fall of complexity in closed systems.

10 chAaronson et al.
2014

Neural Turing Machines

External memory, differentiable addressing, copy/sort tasks.

10 chGraves et al.
2016

Deep Speech 2

End-to-end speech recognition, CTC loss, English + Mandarin.

10 chAmodei et al.
2020

Scaling Laws for Neural LMs

Power law scaling over 7 orders of magnitude.

10 chKaplan et al.
Deep Reinforcement Learning23
2303.04137 · 2023

Diffusion Policy

Visuomotor policy learning via action diffusion — 46.9% average improvement across 15 tasks.

10 chaptersChi, Xu et al.
2304.13705 · 2023

ACT: Action Chunking with Transformers

Low-cost bimanual manipulation via CVAE-based action chunking — 80-90% on fine tasks from 10 min of demos.

10 chaptersZhao et al.
1992

REINFORCE

The foundational policy gradient — statistical gradient-following via the log-derivative trick.

10 chaptersWilliams
NeurIPS 1999

Policy Gradient Theorem

The theorem that enabled actor-critic methods — policy gradients with function approximation.

10 chaptersSutton et al.
1707.06347 · 2017

Proximal Policy Optimization (PPO)

The workhorse of modern RL — clipped surrogate objective for stable, efficient policy learning.

10 chaptersSchulman et al.
1506.02438 · ICLR 2016

Generalized Advantage Estimation (GAE)

The exponentially-weighted advantage estimator that controls bias-variance tradeoff — used by PPO and every modern actor-critic.

10 chaptersSchulman et al.
1312.5602 · 2013

DQN: Playing Atari with Deep RL

CNN + Q-learning + experience replay — the paper that launched deep RL from raw pixels.

10 chaptersMnih et al.
1509.06461 · 2015

Double DQN

One-line fix to Q-learning's overestimation bias — decouple selection from evaluation.

10 chaptersvan Hasselt et al.
1707.06887 · 2017

Distributional RL (C51)

Learn the full return distribution, not just the mean — Wasserstein contraction and 51-atom categories.

10 chaptersBellemare et al.
2110.06169 · 2021

Implicit Q-Learning (IQL)

Offline RL without querying OOD actions — expectile regression on the value function for implicit policy improvement.

10 chaptersKostrikov et al.
1706.03741 · 2017

Deep RL from Human Preferences (RLHF)

Learn reward from pairwise human comparisons — the paper that launched RLHF and made ChatGPT possible.

10 chaptersChristiano et al.
2305.18290 · 2023

Direct Preference Optimization (DPO)

Your language model is secretly a reward model — skip the RL loop and align with a simple classification loss.

10 chaptersRafailov et al.
ICLR 2026 Oral

Why DPO Is a Misspecified Estimator

DPO projects rewards onto a low-dimensional manifold — causing preference reversal and reward degradation. AuxDPO fixes it with nullspace degrees of freedom.

10 chaptersGopalan, Chowdhury & Banerjee
2502.16182 · 2025

Implicit Preference Optimization (IPO)

Your language model is secretly a preference classifier — extract P("Yes")/P("No") to self-judge and self-improve via DPO.

10 chaptersGarg et al.
1906.08253 · 2019

MBPO: When to Trust Your Model

Short model rollouts branched from real data — 10x sample efficiency of model-free methods with matching asymptotic performance.

10 chaptersJanner et al.
1707.01495 · 2017

Hindsight Experience Replay (HER)

Failed episode? Pretend the achieved state was the goal — sparse rewards solved via implicit curriculum.

10 chaptersAndrychowicz et al.
1611.02779 · 2016

RL²: Fast RL via Slow RL

Learn the RL algorithm itself — encode it in an RNN's weights via meta-learning across tasks.

10 chaptersDuan et al.
2008.02790 · 2021

DREAM: Decoupled Exploration & Exploitation

Break meta-RL's chicken-and-egg problem by separating exploration and exploitation objectives.

10 chaptersLiu et al.
2204.01691 · 2022

SayCan: Grounding Language in Affordances

LLMs know what to do, robots know what they can do — multiply the two for grounded planning.

10 chaptersAhn et al.
1804.10332 · 2018

Sim-to-Real: Agile Quadruped Locomotion

Domain randomization + accurate actuator models enable RL policies to transfer from sim to real robots.

10 chaptersTan et al.
2402.10329 · 2024

UMI: Universal Manipulation Interface

In-the-wild robot teaching without in-the-wild robots — handheld gripper demos transfer to any robot via relative actions.

10 chaptersChi, Xu et al.
2211.07819 · 2022

General Intelligence Requires Rethinking Exploration

The bottleneck isn't better models — it's better data. A unified framework for exploration across SL and RL, and the path to open-ended learning.

10 chaptersJiang, Rocktäschel, Grefenstette
2004.07219 · 2020

D4RL: Datasets for Offline RL

The benchmark that shaped offline RL. Maze2D, AntMaze, Adroit, Kitchen — diverse datasets revealing algorithm deficiencies.

8 chaptersFu et al.
Spatial Computing20
2026

SHARP: Single-Image View Synthesis in <1s

Feedforward single-image to 1.2M 3D Gaussians. Depth Pro backbone, two-layer depth, self-supervised finetuning. 1000x faster than diffusion.

10 chaptersApple · Mescheder et al.
ICLR 2026

Theory of Space

Can foundation models build spatial beliefs through active exploration? Cognitive map probing reveals belief instability and inertia.

10 chaptersZhang, Huang, Wang et al.
1803.11288 · 2018

FutureMapping: Spatial AI Systems

The computational structure of Spatial AI — co-designing SLAM, deep learning, and specialized processors for always-on spatial intelligence.

10 chaptersDavison
1910.14139 · 2019

FutureMapping 2: Gaussian Belief Propagation

Local message passing on factor graphs for distributed Spatial AI — matching algorithms to graph processors.

10 chaptersDavison & Ortiz
2401.12168 · 2024

SpatialVLM: Spatial Reasoning for VLMs

Endowing vision-language models with quantitative spatial reasoning via synthetic 3D training data.

10 chaptersChen, Xu et al.
2604.14144 · 2026

SpatialEvo: Self-Evolving Spatial Intelligence

Deterministic geometric environments as zero-noise oracles for self-evolving 3D spatial reasoning.

10 chaptersLi et al.
2604.05212 · 2026

Boxer: 2D→3D Bounding Box Lifting

Robustly lifting open-world 2D detections into metric 3D bounding boxes with uncertainty.

10 chaptersDeTone et al.
2505.23756 · 2025

Rooms from Motion

Object-centric localization and mapping — replacing points with 3D boxes for un-posed indoor SfM.

10 chaptersLazarow, Kang & Dehghan
2412.04458 · 2024

Cubify Anything (CuTR + CA-1M)

Scaling indoor 3D detection: 440K exhaustive 3D boxes + a fully transformer detector that beats point methods.

10 chaptersLazarow et al.
2306.15667 · 2023

PoseDiffusion

Camera pose estimation as diffusion denoising with geometry-guided epipolar sampling.

10 chaptersWang et al.
2312.04563 · 2023

VGGSfM

Fully differentiable Structure from Motion — deep tracking, joint cameras, differentiable BA.

10 chaptersWang et al.
2307.10984 · 2023

Metric3D

Zero-shot metric depth from a single image via canonical camera space transformation.

10 chaptersYin et al.
2404.15506 · 2024

Metric3D v2

Geometric foundation model for joint zero-shot metric depth and surface normal estimation.

10 chaptersHu, Yin et al.
2403.12013 · 2024

GeoWizard

Unleashing Stable Diffusion priors for joint depth and normal estimation from single images.

10 chaptersFu, Yin et al.
2307.07635 · 2023

CoTracker

Joint point tracking in video — exploiting inter-track correlations with transformer attention.

10 chaptersKaraev et al.
2410.11831 · 2024

CoTracker3

Pseudo-labeling real videos with cycle consistency for better point tracking.

10 chaptersKaraev et al.
2604.14141 · 2026

LingBot-Map

Streaming 3D reconstruction with geometric context attention — anchor, window, and trajectory memory.

10 chaptersChen et al.
2601.18692 · 2026

LingBot-VLA

Pragmatic VLA foundation model — 20K hours of real dual-arm data, scaling study, GM-100 benchmark.

10 chaptersWu et al.
2026

LingBot-Depth

Masked depth modeling — treating RGB-D sensor failures as natural masks for self-supervised depth completion.

10 chaptersTan et al.
2512.14692 · 2025

TRELLIS.2: Native 3D Latents

O-Voxel sparse representation + 16× compression VAE + 4B flow-matching model for high-quality 3D asset generation.

10 chaptersXiang et al.
World Models5
Agentic Systems19
2605.17172 · 2026

OpenJarvis: Personal AI, On Personal Devices

Decompose AI stacks into 5 typed primitives, optimize jointly via LLM-guided spec search. On-device specs within 3.2 pp of cloud at 800× lower cost.

10 chaptersSaad-Falcon, Narayan et al. (Stanford)
2604.05336 · 2026

TRACE: Capability-Targeted Agentic Training

Contrastive trajectory analysis identifies missing capabilities, synthesizes targeted training environments, trains per-capability LoRA adapters via GRPO. +14.1 on τ²-Bench.

10 chaptersKang, Suresh et al. (Stanford)
2605.18747 · 2026

Code as Agent Harness

Unified framework for executable, verifiable, stateful agent systems — code as the operational substrate for reasoning, acting, memory, and coordination.

11 chaptersNing, Tieu, Fu et al. (UIUC + Meta + Stanford)
2604.14228 · 2026

Dive into Claude Code

The design space of AI agent systems — permission, compaction, extensibility, delegation, persistence.

10 chaptersLiu et al.
2603.28052 · 2026

Meta-Harness

End-to-end optimization of model harnesses — a coding agent searches over harness code using full filesystem history.

10 chaptersLee et al.
2508.20033 · 2026

DeepScholar-Bench

Live benchmark for generative research synthesis — no system exceeds 31% geometric mean across knowledge, retrieval, and verifiability.

10 chaptersPatel, Arabzadeh et al.
2503.16416 · 2025

Agent Evaluation Survey

First comprehensive survey of LLM agent evaluation — planning, tool use, web/SWE agents, generalist benchmarks.

10 chaptersYehudai et al.
2411.00640 · 2024

Error Bars for Evals

Statistical rigor for LLM evaluation — clustered CIs, paired tests, power analysis.

10 chaptersMiller (Anthropic)
2509.06917 · 2025

Paper2Agent

Converting research papers into MCP-based AI agents — auto-generated tools, test-driven refinement.

10 chaptersMiao et al.
2506.07982 · 2025

τ²-Bench

Dual-control agent evaluation — both agent and user use tools in a shared Dec-POMDP environment.

10 chaptersBarres et al.
2410.02089 · 2025

RLEF

RL from execution feedback — teaching code LLMs to read compiler errors and fix iteratively.

10 chaptersGehring et al.
2310.04406 · 2024

LATS: Language Agent Tree Search

MCTS for LM agents — reasoning + acting + planning unified. 92.7% HumanEval.

10 chaptersZhou et al.
2311.05772 · 2024

ADAPT

As-needed recursive task decomposition — try first, decompose on failure. +28.3% ALFWorld.

10 chaptersPrasad et al.
2505.22954 · 2026

Darwin Gödel Machine

Self-improving agents — evolve agent code with empirical validation, quality-diversity archive.

10 chaptersZhang et al.
2408.06292 · 2024

The AI Scientist

Fully automated scientific discovery — ideate, code, experiment, write paper, review. <$15/paper.

10 chaptersLu et al. (Sakana)
2504.08066 · 2025

AI Scientist v2

Workshop-level discovery via agentic tree search — first AI-authored paper accepted at ICLR workshop.

10 chaptersYamada et al. (Sakana)
2025

AlphaEvolve

Evolutionary coding agent — first Strassen improvement in 56 years, Google infrastructure optimization.

10 chaptersNovikov et al. (DeepMind)
2026

GIANTS: Insight Anticipation

Predict a downstream paper’s core insight from its parent papers via RL with similarity rewards. 34% improvement over gemini-3-pro.

10 chaptersHe-Yueya, Singh et al. (Stanford)
ICLR 2026 Outstanding

LLMs Get Lost In Multi-Turn Conversation

39% performance drop across 15 LLMs when tasks are delivered gradually across turns. Sharded simulation decomposes degradation into aptitude loss and reliability collapse.

10 chaptersLaban, Hayashi, Zhou & Neville
Inference-Time Compute12
ICLR 2026 Oral

Energy-Based Transformers

Predict by minimizing energy — 35% faster scaling than Transformer++, 29% better System 2 thinking from unsupervised learning alone.

10 chaptersGladstone et al.
2507.19457 · 2026

GEPA: Reflective Prompt Evolution

Natural language reflection on execution traces outperforms GRPO RL with 35× fewer rollouts.

10 chaptersAgrawal et al.
2504.12516 · 2025

BrowseComp

Benchmark for browsing agents — accuracy scales smoothly with test-time compute.

10 chaptersWei et al. (OpenAI)
2407.21787 · 2024

Large Language Monkeys

Inference scaling via repeated sampling — coverage follows exponentiated power law across 4 orders of magnitude.

10 chaptersBrown et al.
2409.15254 · 2025

ARCHON

Architecture search over inference-time technique combinations — outperforms o1 by 15.1%.

10 chaptersSaad-Falcon et al.
2408.03314 · 2024

Scaling Test-Time Compute

Optimal test-time compute beats 14× larger model. Search vs revision, difficulty-dependent allocation.

10 chaptersSnell et al.
2502.17578 · 2025

LLM Power Laws

Heavy-tailed p-distributions transform per-problem exponential scaling into aggregate power laws.

10 chaptersSchaeffer et al.
2506.18203 · 2025

Weaver: Weak Verifier Ensembles

Combine weak verifiers via weak supervision — Llama 70B + Weaver = o3-mini accuracy.

10 chaptersSaad-Falcon et al.
2305.20050 · 2023

Let's Verify Step by Step

Process reward models (PRM800K) outperform outcome supervision — 78% MATH with per-step verification.

10 chaptersLightman et al. (OpenAI)
2312.08935 · 2024

Math-Shepherd

Automatic process rewards via completion sampling — no human labels needed for step-level supervision.

10 chaptersWang et al.
2503.04412 · 2025

AB-MCTS: Wider or Deeper?

Adaptive branching tree search — dynamically balance width (diversity) vs depth (refinement).

10 chaptersInoue et al. (Sakana)
2501.05366 · 2025

Search-o1

Agentic search during reasoning — detect uncertainty, retrieve, reason-in-documents, inject into CoT.

10 chaptersLi et al.
Training & RL for LLMs10
2507.20534 · 2025

Kimi K2

1T-param MoE with MuonClip optimizer and agentic RL — SOTA open-source on SWE-Bench and agentic tasks.

10 chaptersKimi Team
2412.19437 · 2024

DeepSeek-V3

671B MoE — MLA (57× KV compression), aux-loss-free balancing, FP8 training, $5.57M total cost.

10 chaptersDeepSeek-AI
2212.08073 · 2022

Constitutional AI

RLAIF — harmlessness from AI feedback via written principles. Critique, revise, train.

10 chaptersBai et al. (Anthropic)
2203.14465 · 2022

STaR: Self-Taught Reasoner

Bootstrapping reasoning — generate rationales, filter correct, rationalize wrong, fine-tune, repeat.

10 chaptersZelikman et al.
2402.03300 · 2024

DeepSeekMath

GRPO — group-relative advantages without critic. 51.7% MATH from 7B model with 120B math tokens.

10 chaptersShao et al.
2503.14476 · 2025

DAPO

Open-source LLM RL at scale — decoupled clipping, dynamic sampling. 50 on AIME with Qwen2.5-32B.

10 chaptersByteDance Seed
2504.04736 · 2025

SWiRL: Step-Wise RL

Multi-step RL for reasoning + tool use — decompose trajectories into sub-trajectories, DPO on each.

10 chaptersGoldie et al.
2603.29791 · 2026

Simula: Reasoning-Driven Synthetic Data

Seedless agentic framework — taxonomy-driven coverage, double-critic quality, calibrated Elo complexity scoring at scale.

10 chaptersDavidson et al.
2604.17654 · 2026

Poly-EPO: Exploratory Reasoning

Set RL with polychromic objectives — train LLMs to explore diverse reasoning strategies via reward×diversity synergy. Up to 20% pass@k gains.

10 chaptersOrney, Hamid et al. (Stanford)
Code Generation3
Action Recognition & Video Understanding5
Deep Learning for Computer Vision32
1409.1556 · 2014

VGGNet

Depth matters — stacking 3×3 convolutions to 16-19 layers for ImageNet SOTA.

10 chSimonyan & Zisserman
1409.4842 · 2014

GoogLeNet / Inception

Multi-scale parallel convolutions in Inception modules — 22 layers, only 5M params.

10 chSzegedy et al.
1411.4038 · 2014

FCN: Fully Convolutional Networks

From classification to dense per-pixel prediction — the birth of deep semantic segmentation.

10 chLong et al.
1311.2524 · 2013

R-CNN

CNN features for object detection — selective search + AlexNet features + SVM classification.

10 chGirshick et al.
1504.08083 · 2015

Fast R-CNN

RoI pooling + multi-task loss — process the image once, 213× faster than R-CNN.

10 chGirshick
1506.01497 · 2015

Faster R-CNN

Region Proposal Networks with anchors — learned proposals replace selective search.

10 chRen et al.
1506.02640 · 2015

YOLO: You Only Look Once

Single-shot detection as regression — 45 fps real-time object detection.

10 chRedmon et al.
2005.12872 · 2020

DETR

End-to-end detection with transformers — Hungarian matching, no NMS, no anchors.

10 chCarion et al.
2104.14294 · 2021

DINO

Self-supervised ViTs with emergent segmentation via self-distillation.

10 chCaron et al.
2304.07193 · 2023

DINOv2

All-purpose visual features without supervision — rivaling CLIP without text.

10 chOquab et al.
2204.12484 · NeurIPS 2022

ViTPose

Plain ViT + two deconv layers = SOTA pose estimation. No fancy modules, no domain-specific tricks. Simplicity wins.

10 chXu et al.
2511.09554 · ICLR 2026

RF-DETR

Weight-sharing NAS for real-time detection — train once, search thousands of sub-architectures for free. First real-time detector above 60 AP on COCO.

10 chRobinson et al.
2308.04079 · 2023

3D Gaussian Splatting

Real-time radiance field rendering via differentiable Gaussian splatting.

10 chKerbl et al.
2212.09748 · 2023

DiT: Diffusion Transformers

Transformer backbone for diffusion with LLM-style scaling laws.

10 chPeebles & Xie
2311.15127 · 2023

Stable Video Diffusion

Three-stage video diffusion — data curation matters more than architecture.

10 chBlattmann et al.
2310.03744 · 2023

LLaVA-1.5

Simple MLP connector + good data beats complex VLM engineering.

10 chLiu et al.
2312.14238 · CVPR 2024

InternVL

6B vision encoder with progressive LLM alignment for universal multimodal tasks.

10 chChen et al.
2311.16502 · CVPR 2024

MMMU Benchmark

Expert-level multimodal benchmark — GPT-4V gets 56.8%, humans get 88.6%.

10 chYue et al.
2312.14132 · CVPR 2024

DUSt3R

Geometric 3D vision made easy — predict pointmaps directly from image pairs.

10 chWang et al.
2401.10891 · 2024

Depth Anything

Monocular depth foundation model via self-training on 62M unlabeled images.

10 chYang et al.
2402.15391 · ICML 2024

Genie

Generative interactive environments from unlabeled video — ICML Best Paper.

10 chBruce et al.
2403.03206 · 2024

Stable Diffusion 3

Rectified flow + MMDiT for scalable text-to-image synthesis.

10 chEsser et al.
2404.02905 · NeurIPS 2024

VAR: Visual Autoregressive Modeling

Next-scale prediction — autoregressive image generation with LLM-style scaling. Best Paper.

10 chTian et al.
2406.09414 · 2024

Depth Anything V2

Synthetic precision + real diversity for state-of-the-art monocular depth.

10 chYang et al.
2406.09756 · 2024

MASt3R

Joint 3D reconstruction + dense matching — grounding image matching in 3D.

10 chLeroy et al.
2408.00714 · 2024

SAM 2

Segment anything in images AND videos — streaming memory attention.

10 chRavi et al.
2408.06072 · 2024

CogVideoX

Expert transformer for video generation with 3D VAE compression.

10 chYang et al.
2409.12191 · 2024

Qwen2-VL

Native dynamic resolution VLM with 2D-RoPE — any image, any aspect ratio.

10 chWang et al.
2412.12392 · CVPR 2025

MASt3R-SLAM

Real-time dense SLAM powered by learned 3D reconstruction priors.

10 chMurai et al.
2503.11651 · CVPR 2025

VGGT

One transformer, one pass — all 3D geometry from unposed images. CVPR Best Paper.

10 chWang et al.
2601.05246 · ICLR 2026 Oral

Depth Anything 3

One plain transformer recovers the visual space from any views — depth + rays, no architectural specialization.

10 chLin, Chen, Liew et al.
CS231n · 2024

GAS: Semantic Mapping from Video

SLAM + detection + segmentation + projection — monocular video to semantic floorplans.

10 chKuznetsov, Bhutra, Pal
2025 (Reimagined)

GAS v2: Modern Semantic Mapping

Rebuilding the semantic mapping pipeline with 2025 foundation models — zero fine-tuning, open vocabulary, no calibration.

12 chKuznetsov, Bhutra, Pal
2511.16719 · 2025

SAM 3: Segment Anything with Concepts

Unified detect + segment + track for open-vocabulary concepts — text or exemplar prompts find ALL instances. Doubles prior accuracy.

11 chCarion, Gustafson, Hu et al.
2604.20329 · 2026

Vision Banana: Generators are Vision Learners

Instruction-tune an image generator to output decodable RGB — beats SAM 3 on segmentation and Depth Anything 3 on depth with one model.

10 chGabeur, Long, Peng et al. (Google)
2604.21681 · ICLR 2026

Sapiens 2: Human-Centric Vision

0.4B–5B ViTs pretrained on 1B human images with unified MAE + contrastive learning. SOTA on pose, segmentation, normals, pointmaps, albedo.

10 chKhirodkar et al. (Meta RL)
Continual Learning3
Human-Computer Interaction3
Training & RL for LLMs2
Interpretability2
Quantization & Compression5
Attention & GPU Kernels2
KV Cache & Inference Efficiency1
LLM-Guided Search & AI for Science2
cs224n — NLP with Deep Learning53
2013

Word2Vec

CBOW and Skip-gram: learning word vectors from context windows.

10 chMikolov et al.
NeurIPS 2013

Word2Vec: Negative Sampling

NCE, subsampling frequent words, phrase detection.

10 chMikolov et al.
EMNLP 2014

GloVe

Co-occurrence matrix + log-bilinear model. Count meets prediction.

10 chPennington et al.
ACL 2015

It's the Tuning, Not the Model

PPMI+SVD matches Skip-gram when hyperparameters align.

8 chLevy et al.
EMNLP 2015

Evaluating Word Embeddings

Intrinsic vs extrinsic evaluation. Word intrusion coherence test.

8 chSchnabel et al.
TACL 2016

Why Word2Vec Works

Random walk model explains PMI, analogies, and low dimensionality.

8 chArora et al.
TACL 2018

Polysemy as Superposition

Polysemous word vectors are linear combinations of sense vectors.

8 chArora et al.
NeurIPS 2018

Optimal Embedding Dimensions

Bias-variance tradeoff for embedding dimension. PIP loss framework.

8 chYin & Shen
Nature 1986

Backpropagation

The paper that made backprop famous. Generalized delta rule.

10 chRumelhart et al.
2016

Yes You Should Understand Backprop

Sigmoid saturation, dead ReLUs, gradient checking, practical debugging.

8 chKarpathy
JMLR 2011

NLP (Almost) from Scratch

Unified neural architecture for POS, NER, chunking, SRL.

8 chCollobert et al.
IEEE 1994

Vanishing Gradients in RNNs

Formal proof: gradient decays as O(λt). Why RNNs forget.

8 chBengio et al.
ICML 2013

Difficulty Training RNNs

Gradient clipping, the cliff phenomenon, practical training recipes.

8 chPascanu et al.
NeurIPS 2017

Attention Is All You Need

The Transformer. Self-attention, multi-head, positional encoding.

10 chVaswani et al.
2016

Layer Normalization

Feature-wise normalization. Pre-LN vs Post-LN. RMSNorm.

8 chBa et al.
ICML 2018

Image Transformer

Self-attention for autoregressive image generation. Local attention.

8 chParmar et al.
ICLR 2019

Music Transformer

Relative position for music. The skewing trick. Long-range structure.

8 chHuang et al.
NAACL 2019

BERT

Bidirectional pre-training via masked LM and next sentence prediction.

10 chDevlin et al.
2019

Contextual Word Representations

From static to contextual: ELMo, GPT, BERT survey.

8 chSmith
2024

Llama 3

Open-weight 8B-405B. GQA, SwiGLU, RoPE, 15T tokens.

10 chMeta AI
2022

Scaling Instruction Tuning

Flan-PaLM/T5: more diverse tasks = better zero/few-shot.

8 chChung et al.
NeurIPS 2023

AlpacaFarm

Simulated human feedback for RLHF research at 1/100th cost.

8 chDubois et al.
NeurIPS 2023

How Far Can Camels Go

Open instruction tuning: data quality > quantity.

8 chWang et al.
NeurIPS 2020

GPT-3

175B params. Few-shot in-context learning without gradient updates.

10 chBrown et al.
NeurIPS 2022

Chain-of-Thought

"Let's think step by step." 18% → 79% on math tasks.

8 chWei et al.
ICLR 2019

Lottery Ticket Hypothesis

Sparse subnetworks match dense performance from original init.

8 chFrankle & Carlin
ICLR 2022

LoRA

Low-rank adaptation: ΔW = BA. 10,000x fewer trainable params.

8 chHu et al.
ICML 2019

Adapter Modules

Bottleneck adapters between frozen layers. ~3% params per task.

8 chHoulsby et al.
NeurIPS 2020

RAG

Retrieval-augmented generation. DPR + BART for knowledge-intensive NLP.

8 chLewis et al.
2023

Toolformer

Self-supervised tool learning. LLM teaches itself to call APIs.

8 chSchick et al.
ICLR 2021

MMLU

57-subject multiple-choice benchmark. Knowledge breadth test.

8 chHendrycks et al.
2022

HELM

Holistic evaluation: 42 scenarios, 7 metrics. Beyond accuracy.

8 chLiang et al.
ICLR 2023

Self-Consistency

Sample multiple CoT paths, majority vote. Diversity beats single-shot.

8 chWang et al.
2025

DeepSeek-R1

RL-only training produces emergent reasoning. GRPO algorithm.

8 chDeepSeek
2025

DAPO

Decoupled Advantage Policy Optimization. Scales RL to 32B.

8 chByteDance
2023

Let's Verify Step by Step

Process reward models score each reasoning step, not just final answer.

8 chLightman et al.
ICML 2023

Speculative Decoding

Draft model + verify in parallel. Lossless 2-3x speedup.

8 chLeviathan et al.
2024

Scaling Test-Time Compute

More inference compute can beat a larger model. Compute-optimal frontier.

8 chSnell et al.
2021

RoPE

Rotary position embedding. Position as rotation in 2D subspaces.

8 chSu et al.
ACL 2016

BPE for Neural MT

Byte Pair Encoding for subword segmentation. Handles OOV via decomposition.

8 chSennrich et al.
ACL 2020

XLM-RoBERTa

Cross-lingual masked LM on 100 languages. Zero-shot transfer.

8 chConneau et al.
EMNLP 2023

Tokenization Cost Across Languages

English-centric tokenizers make API costs 2-15x higher for other languages.

8 chAhia et al.
2025

Agentic Interpretability

LLM agents automate interpretability research at scale.

8 chKim et al.
PNAS 2024

Concept Discovery in AlphaZero

Extract human-understandable chess concepts that transfer to human players.

8 chSchut et al.
2025

AI Needs New Vocabulary

Human concepts are insufficient to describe AI representations.

8 chKim et al.
2025

Neologism Learning

Train models to create new words for internal concepts.

8 chKim et al.
2024

Chameleon

Mixed-modal early fusion. All modalities as unified tokens.

8 chMeta AI
2024

Transfusion

AR text + diffusion images in one model. Hybrid objectives.

8 chMeta AI
2024

Mixture-of-Transformers

Modality-specific experts with shared attention. Sparse routing.

8 chLiang et al.
2023

Scaling Laws for Mixed-Modal

Power law scaling for text+image models. Optimal mixing ratios.

8 chAghajanyan et al.
2023

CM3Leon

Autoregressive multimodal with retrieval augmentation.

8 chYu et al.
2022

RA-CM3

Retrieval-augmented multimodal language modeling.

8 chYasunaga et al.
2024

LMFusion

Adapting text-only LLMs for multimodal generation.

8 chShi et al.
2025

OneFlow

Concurrent mixed-modal generation with edit flows.

8 chTay et al.
2025

Multimodal RewardBench

Evaluating reward models for vision-language models.

8 chChen et al.
2025

Reconstruction Alignment

Reconstruction objectives improve multimodal model alignment.

8 chLu et al.
cs224w — Graph ML Papers31
KDD 2014

DeepWalk

Random walks as graph "sentences" + Word2Vec = node embeddings.

8 chPerozzi et al.
WWW 2015

LINE

First + second-order proximity. Edge sampling for scalability.

8 chTang et al.
KDD 2016

node2vec

Biased walks with p,q: BFS (homophily) vs DFS (structural).

10 chGrover & Leskovec
ICLR 2017

GCN

Semi-supervised classification with spectral graph convolutions.

10 chKipf & Welling
NeurIPS 2017

GraphSAGE

Sample + aggregate. Inductive. Mean/pool/LSTM aggregators.

8 chHamilton et al.
ICLR 2018

GAT

Learned attention weights. Multi-head. Inductive on PPI.

10 chVelickovic et al.
ESWC 2018

R-GCN

Relation-specific weights for heterogeneous graphs + basis decomposition.

8 chSchlichtkrull et al.
KDD 2018

PinSage

3B nodes, 18B edges. Random-walk neighborhoods at Pinterest scale.

10 chYing et al.
ICML 2018

JK-Net

Jumping knowledge: concatenate ALL layer embeddings, then aggregate.

8 chXu et al.
ICLR 2019

GIN

Maximum expressiveness = WL test. Sum + MLP is injective.

10 chXu et al.
ICML 2019

SGC

Collapse GCN layers into one linear operation. Simpler, competitive.

8 chWu et al.
NeurIPS 2020

Design Space of GNNs

315K configurations tested. Best GNN depends on the task.

8 chYou et al.
WWW 2020

HGT

Heterogeneous Graph Transformer: type-specific Q/K/V projections.

8 chHu et al.
AAAI 2021

ID-GNN

Identity-aware coloring via ego-network extraction. Beyond GIN.

8 chYou et al.
NeurIPS 2020

Distance Encoding

Augment nodes with distances to targets. Provably beyond 1-WL.

8 chLi et al.
NeurIPS 2021

Graphormer

Centrality + spatial + edge encodings. Won OGB-LSC 2021.

10 chYing et al.
NeurIPS 2022

GPS

MPNN + Transformer + PE hybrid. General, powerful, scalable.

8 chRampasek et al.
NeurIPS 2013

TransE

h + r ≈ t. Relations as translations. Simple, elegant, limited.

8 chBordes et al.
ICLR 2015

DistMult

Bilinear KG scoring. Symmetric by construction.

8 chYang et al.
ICML 2016

ComplEx

Complex embeddings break symmetry via conjugation.

8 chTrouillon et al.
ICLR 2019

RotatE

Relations as rotations in complex space. All relation patterns.

8 chSun et al.
SIGIR 2019

NGCF

Neural graph collaborative filtering on bipartite interaction graphs.

8 chWang et al.
SIGIR 2020

LightGCN

Simplified GCN for CF: no nonlinearities, no transforms. Better.

8 chHe et al.
2023

Relational Deep Learning

Databases ARE graphs. GNNs replace feature engineering. RelBench.

8 chFey et al.
2018

Polypharmacy Side Effects

Decagon: R-GCN predicts drug-drug interaction side effects.

8 chZitnik et al.
ICML 2020

Learning to Simulate Physics

Particles as nodes. GNN predicts next state. Generalizes across materials.

8 chSanchez-Gonzalez et al.
ICML 2018

GraphRNN

Autoregressive graph generation. Two-level RNN: nodes then edges.

8 chYou et al.
NeurIPS 2018

GCPN

RL-guided molecular generation. GCN policy + PPO optimization.

8 chYou et al.
ICML 2018

JT-VAE

Junction tree VAE: hierarchical molecule generation. Always valid.

8 chJin et al.
2022

ReAct

Thought-Action-Observation loop for LLM agents.

8 chYao et al.
2023

Reflexion

Verbal self-reflection as reinforcement learning. No weight updates.

8 chShinn et al.
Foundational Papers8
2015

Knowledge Distillation

Soft targets, temperature scaling, dark knowledge from teacher networks.

9 chHinton et al.
2604.00626 · 2026

On-Policy Distillation Survey

Why students must practice on their own mistakes: f-divergence framework, exposure bias O(εT²)→O(εT), and the RL equivalence.

10 chSong & Zheng
2021

Vision Transformer (ViT)

An image is worth 16x16 words — patches as tokens.

9 chDosovitskiy et al.
ICLR 2026 Outstanding

Transformers are Inherently Succinct

Transformers encode concepts exponentially more compactly than RNNs and doubly exponentially more than automata — making verification EXPSPACE-complete.

10 chaptersBergsträßer, Cotterell & Lin
ICLR 2026 Oral

Mamba-3: Improved Sequence Modeling

Exponential-trapezoidal discretization, complex-valued states, and MIMO — three SSM-principled upgrades that push the Pareto frontier of quality vs. inference speed.

10 chaptersLahoti, Li, Chen, Wang, Bick, Kolter, Dao & Gu
2605.06216 · 2026

TIDE: Token Identity Delivered Everywhere

K independent MemoryBlocks inject token identity at every layer, amplifying rare-token gradients K-fold and bypassing contextual collapse.

10 chaptersJaiswal et al. (Apple)
2605.06614 · 2026

SkillOS: Self-Evolving Agents

RL-trained skill curator learns when to insert, update, and delete reusable Markdown skills. 8B curator beats Gemini-2.5-Pro.

10 chaptersOuyang et al. (Google)
ICML 2025

Distillation Scaling Laws

Power-law scaling for knowledge distillation — optimal teacher-student size ratios, capacity gaps, and when distillation beats pre-training compute-for-compute.

10 chaptersApple