Embodied AI Foundations
The robot learning problem from first principles — observation and action spaces, MDPs, simulation environments, the reality gap, and why embodied intelligence is fundamentally different from language.
Where perception meets physical action — from the fundamentals of imitation learning to the foundation models teaching robots to see, understand language, and act in the real world. RT-2, OpenVLA, and the path to generalist embodied intelligence. Built for engineers who want to understand, not just use.
The robot learning problem from first principles — observation and action spaces, MDPs, simulation environments, the reality gap, and why embodied intelligence is fundamentally different from language.
Learning from demonstrations, the behavioral cloning objective, distribution shift and compounding errors, DAgger, and when imitation beats reinforcement learning.
Spatial representations for manipulation — depth sensing, point clouds, 3D scene understanding, pretrained visual features, and what robots need to see that ImageNet doesn't teach.
Task specification through natural language, grounding instructions to motor commands, language embeddings as goal representations, and multi-task policies.
RT-1's tokenized actions, RT-2's vision-language-action transfer, Octo's modular design, OpenVLA's open-source recipe, and how foundation models become robot policies.
Open X-Embodiment and DROID datasets, teleoperation and data collection, sim-to-real transfer, domain randomization, data scaling laws, and cross-embodiment generalization.
SayCan and affordance-based planning, inner monologue, code-as-policy, LLM-driven task decomposition, world models, and chain-of-thought reasoning for robots.
Real-world robustness and safety, dexterous manipulation, humanoid robots, generalist policies, multi-robot coordination, and the path to foundation models that act.