8-Part Technical Series

Vision-Language-Action Models

Where perception meets physical action — from the fundamentals of imitation learning to the foundation models teaching robots to see, understand language, and act in the real world. RT-2, OpenVLA, and the path to generalist embodied intelligence. Built for engineers who want to understand, not just use.

8 Articles

~280 Min total read

30+ Interactive demos

The Series

Foundations

Embodied AI Foundations

The robot learning problem from first principles — observation and action spaces, MDPs, simulation environments, the reality gap, and why embodied intelligence is fundamentally different from language.

~30 min read Read →

Core Theory

Imitation Learning & Behavioral Cloning

Learning from demonstrations, the behavioral cloning objective, distribution shift and compounding errors, DAgger, and when imitation beats reinforcement learning.

~35 min read Read →

Architecture

Vision Encoders for Robotics

Spatial representations for manipulation — depth sensing, point clouds, 3D scene understanding, pretrained visual features, and what robots need to see that ImageNet doesn't teach.

~35 min read Read →

Core Theory

Language-Conditioned Policies

Task specification through natural language, grounding instructions to motor commands, language embeddings as goal representations, and multi-task policies.

~35 min read Read →

Architecture

VLA Architectures

RT-1's tokenized actions, RT-2's vision-language-action transfer, Octo's modular design, OpenVLA's open-source recipe, and how foundation models become robot policies.

~40 min read Read →

Training

Training Data & Pipelines

Open X-Embodiment and DROID datasets, teleoperation and data collection, sim-to-real transfer, domain randomization, data scaling laws, and cross-embodiment generalization.

~35 min read Read →

Capabilities

Planning & Reasoning

SayCan and affordance-based planning, inner monologue, code-as-policy, LLM-driven task decomposition, world models, and chain-of-thought reasoning for robots.

~35 min read Read →

Applications

Deployment & Frontiers

Real-world robustness and safety, dexterous manipulation, humanoid robots, generalist policies, multi-robot coordination, and the path to foundation models that act.

~35 min read Read →

Tutorial

Robot Learning: A Tutorial

HuggingFace/LeRobot's complete tutorial: classical robotics, RL, imitation learning (ACT, Diffusion Policy), and VLAs (π0, SmolVLA). All with runnable code.

~45 min read Read →

Deep DiveDeployment

Scaling, Optimizing & Deploying Foundation Models

VLMs, VLAs, and World Models — from architecture to quantization to production deployment. Includes autonomous driving perception/planning and portfolio projects.

~55 min read Read →