You build the simulation platform AND the robot software that runs on it. From rigid body dynamics to sim-to-real transfer, every chapter covers concept, design, code, debugging, and the frontier.
You're a robotics simulation and software engineer at Apple's Special Projects Group. Not a pure researcher who publishes papers. Not a mechanical engineer who designs hardware. You own the full software stack from physics simulation to real-world deployment.
Here is what your day looks like. Morning: a contact model in MuJoCo produces unrealistic friction forces during a grasp sequence — you debug the solver parameters and validate against real force-torque sensor data. Afternoon: the motion planner for a new manipulation task takes 800ms per query — you need it under 50ms. You switch from RRT to a cached PRM with lazy collision checking. Evening: the sim-to-real transfer results come in. The policy that achieved 94% success in simulation drops to 35% on the real robot. You systematically identify that actuator delay is the dominant gap and add a 20ms latency randomization to the sim.
The full system you own:
Real numbers for a manipulation system development cycle:
| Metric | Without Sim | With Full Sim Pipeline |
|---|---|---|
| Time to first working policy | 6 months (real-only) | 3 weeks (sim) + 2 days (transfer) |
| Cost per training run | $50K (hardware wear + engineer time) | $200 (GPU compute) |
| Iterations per day | 2-3 (limited by real robot) | 1000+ (parallelized sim) |
| Safety incidents during dev | ~12 (broken grippers, dropped objects) | 0 in sim, 1 during transfer |
| Sim-to-real success rate | N/A | 60-90% (depends on domain gap) |
The simulation platform is the 10x multiplier. Every hour you invest in sim fidelity saves ten hours on the real robot. But simulation is never perfect — and knowing exactly where it's wrong is the skill that separates senior engineers from juniors.
Click any box to see what data flows in and out. This is the full robotics software stack you own.
Every physics simulator solves the same fundamental problem: given forces acting on objects, compute how they move. The state of a rigid body is fully described by six quantities: position (x, y, z), orientation (how it's rotated), linear velocity (vx, vy, vz), and angular velocity (ωx, ωy, ωz). That's 13 numbers if you use quaternions for orientation (4 numbers instead of 3 Euler angles).
Why quaternions? Euler angles suffer from gimbal lock — a configuration where two rotation axes align and you lose a degree of freedom. Point your arm straight up and try to distinguish "turn left" from "roll" — they become the same motion. Quaternions avoid this entirely at the cost of one extra number and a normalization constraint (qw2 + qx2 + qy2 + qz2 = 1).
The equations of motion are Newton's second law applied to both translation and rotation:
The gyroscopic term ω × (I · ω) is what makes rotation non-intuitive. A spinning top stays upright because of this cross product. A tumbling satellite exhibits bizarre wobbling. This term is what separates "physics for games" (which often ignores it) from "physics for robotics" (which cannot).
The simulation runs in a tight loop at a fixed timestep, typically 1kHz (dt = 0.001s) for robotics. Each step:
Timestep selection is the first design decision. Too large (dt=0.01s) and the simulation becomes unstable — objects fly through each other or energy grows without bound. Too small (dt=0.0001s) and the simulation is 10x slower than needed. The sweet spot for rigid body robotics: 1-4ms (250Hz-1kHz). For soft body or deformable objects, you may need 0.1ms.
Integration method matters enormously. Explicit Euler (v += a*dt, x += v*dt) is simple but adds energy over time — your simulation slowly "explodes." Semi-implicit Euler (update velocity first, then use new velocity for position) is nearly as simple but much more stable. Runge-Kutta 4 (RK4) is accurate but expensive — 4 force evaluations per step. Most physics engines use semi-implicit Euler because the stability-per-cost ratio is unbeatable.
python import numpy as np class RigidBody2D: def __init__(self, mass, inertia, x, y, angle=0): self.mass = mass self.inertia = inertia # scalar for 2D self.pos = np.array([x, y], dtype='float64') self.angle = angle self.vel = np.zeros(2) self.omega = 0.0 # angular velocity (scalar in 2D) self.force = np.zeros(2) self.torque = 0.0 def apply_force(self, f, point=None): self.force += f if point is not None: r = point - self.pos self.torque += r[0]*f[1] - r[1]*f[0] # 2D cross product def step(self, dt, gravity=np.array([0, -9.81])): # Semi-implicit Euler: update velocity first, then position self.force += self.mass * gravity a = self.force / self.mass alpha = self.torque / self.inertia self.vel += a * dt # velocity update self.omega += alpha * dt # angular velocity update self.pos += self.vel * dt # position uses NEW velocity self.angle += self.omega * dt # Reset accumulators self.force = np.zeros(2) self.torque = 0.0 # Ground collision with restitution def ground_collision(body, ground_y=0, restitution=0.6): if body.pos[1] < ground_y: body.pos[1] = ground_y # project out of ground body.vel[1] *= -restitution # bounce with energy loss body.omega *= 0.95 # friction damps rotation # Run simulation box = RigidBody2D(mass=1.0, inertia=0.1, x=0, y=5.0, angle=0.3) dt = 0.001 # 1kHz for _ in range(5000): # 5 seconds ground_collision(box) box.step(dt) # Energy = 0.5*m*v^2 + 0.5*I*w^2 + m*g*h KE = 0.5*box.mass*np.dot(box.vel, box.vel) + 0.5*box.inertia*box.omega**2 PE = box.mass * 9.81 * box.pos[1] # With semi-implicit Euler, total energy decays (stable) # With explicit Euler, it would GROW (unstable)
Let's trace a single bouncing ball through three integration methods to see why the choice matters. Initial state: height = 5m, velocity = 0, mass = 1kg, g = 9.81 m/s2, dt = 0.01s, restitution = 1.0 (perfectly elastic).
Explicit Euler (update position with OLD velocity, then update velocity):
Semi-implicit Euler (update velocity FIRST, then position with NEW velocity):
RK4: E stays within 0.001% of initial after 10 bounces — accurate but costs 4x per step. For robotics at 1kHz, semi-implicit Euler gives the best stability-per-compute ratio.
Differentiable physics engines compute gradients through the simulation. Instead of just running forward simulation, you can backpropagate through the physics to optimize control parameters, robot designs, or sim parameters to match real data. Key projects:
| Engine | Framework | Key Feature |
|---|---|---|
| Brax | JAX | Massively parallel on GPU/TPU, used for RL |
| DiffTaichi | Taichi | Custom DSL for differentiable simulation |
| Warp | NVIDIA/Python | CUDA kernels with autodiff |
| MuJoCo MJX | JAX | MuJoCo's contact model, differentiable on GPU |
| Dojo | Julia | Smooth contact gradients via randomized smoothing |
The practical impact: differentiable physics lets you do gradient-based system identification. Instead of grid-searching over friction values, compute d(sim_trajectory)/d(friction) and use gradient descent to match real trajectories. This converts a combinatorial search (exponential in number of parameters) into a smooth optimization (linear in iterations).
The frontier is learning simulation parameters from real data. Record a real robot dropping an object, then backpropagate through the differentiable sim to find the friction and restitution coefficients that make the sim match reality. This is system identification via gradient descent — dramatically faster than manual tuning.
A box falls under gravity and bounces. Adjust gravity, restitution (bounciness), and timestep. Watch the energy bar — semi-implicit Euler dissipates energy (stable), explicit Euler would add it (unstable).
Contact is the hardest problem in physics simulation. Two objects touching creates a discontinuity — velocity can change instantaneously. The simulation must detect when objects collide, compute the contact forces, and resolve the interpenetration, all within one timestep.
The collision detection pipeline has two phases. Broad phase quickly eliminates pairs that definitely don't collide using cheap bounding volume tests. Narrow phase computes exact contact points for the remaining candidates.
Broad phase uses AABB (axis-aligned bounding boxes). Each object gets a box aligned to the world axes. Two AABBs overlap only if they overlap on ALL three axes — a trivial comparison of six numbers. For N objects, a naive check is O(N2), but a sweep-and-prune algorithm (sort AABBs along each axis) reduces this to O(N log N). A bin of 100 objects has 4,950 potential pairs, but broad phase culls this to ~200.
Narrow phase uses GJK (Gilbert-Johnson-Keerthi) for convex shapes. GJK tests whether the Minkowski difference of two shapes contains the origin. If it does, they intersect. EPA (Expanding Polytope Algorithm) then finds the penetration depth and contact normal. For non-convex shapes (meshes), the engine decomposes them into convex hulls first.
Once we detect a contact, we need to resolve it. There are two fundamentally different approaches:
Penalty methods are the simplest. When two objects overlap by depth d along normal n, apply a spring force F = k · d · n. This pushes them apart. The problem: to prevent visible penetration, you need a very stiff spring (large k), which requires a very small timestep to remain stable. It's a tradeoff between visual quality and simulation speed.
Constraint-based methods (used by MuJoCo, Drake, Bullet) formulate contacts as velocity-level constraints. Instead of "apply a force proportional to penetration," they say "find the minimum force that prevents further penetration." This is a Linear Complementarity Problem (LCP): either the contact force is zero (objects separating) or the gap is zero (objects touching). Never both positive. MuJoCo goes further with a convex optimization formulation that always has a unique solution.
Friction follows the Coulomb model: tangential friction force |ft| ≤ μ · fn, where μ is the friction coefficient and fn is the normal force. The friction force lives inside a cone. Below the cone limit, the contact is sticking (static friction). At the limit, it's sliding (kinetic friction). Simulating this accurately is critical for grasping — without proper friction, a robot can't hold anything.
python import numpy as np def aabb_overlap(a_min, a_max, b_min, b_max): """Check if two AABBs overlap. O(1) per pair.""" return (a_min[0] <= b_max[0] and a_max[0] >= b_min[0] and a_min[1] <= b_max[1] and a_max[1] >= b_min[1]) def circle_collision(p1, r1, p2, r2): """Narrow phase for circles: penetration depth + normal.""" diff = p2 - p1 dist = np.linalg.norm(diff) if dist < r1 + r2 and dist > 1e-8: normal = diff / dist depth = (r1 + r2) - dist return depth, normal return 0, np.zeros(2) def penalty_force(depth, normal, v_rel, k=5000, b=50): """Spring-damper penalty force.""" if depth <= 0: return np.zeros(2) f_spring = k * depth * normal # pushes objects apart f_damp = -b * np.dot(v_rel, normal) * normal # damps oscillation return f_spring + f_damp # Why penalty is bad for robotics: # k=5000 requires dt < 2*sqrt(m/k) ~ 0.03s for stability # k=50000 (for stiff contact) requires dt < 0.009s # k=500000 (realistic steel) requires dt < 0.003s # Real engines use constraint-based methods instead
nconmax (max contacts) and njmax (max constraints). Check that solver iterations is at least 50-100 for complex contact scenes.condim=4 gives full friction cone; condim=1 is frictionless.Differentiable contact is the holy grail. Contact creates discontinuities (velocity jumps), which break gradients. Approaches: (1) Randomized smoothing (Dojo) — average over random perturbations to smooth out the discontinuity. (2) Soft contact models (Drake's hydroelastic) — replace hard contacts with smooth force fields. (3) Learned contact models — train a neural network to predict contact forces from geometry and velocities. The frontier is combining physics-based priors with learned residuals.
Drag the orange circle. When it overlaps the teal circle, see penetration depth, contact normal, and penalty force. Adjust stiffness and friction.
Contact dynamics are formulated as a Linear Complementarity Problem (LCP). The key constraint is: fn ≥ 0, φ ≥ 0, fn · φ = 0, where fn is the normal contact force and φ is the gap distance between objects.
Your task: (1) Explain physically why these three conditions must hold simultaneously (why can't fn > 0 and φ > 0 at the same time?). (2) Derive the Signorini condition from energy principles. (3) Show why this makes the contact problem non-smooth (and thus harder than standard ODE integration).
Signorini conditions from first principles:
1. No interpenetration: φ(q) ≥ 0 (geometric constraint on configuration q)
2. No adhesion: fn ≥ 0 (ground reaction is compressive only)
3. Complementarity: fn · φ(q) = 0 (force only when in contact)
From energy: At time t, the system has velocity v and we seek the force fn that satisfies Newton's law M·a = fext + fn·n while maintaining φ ≥ 0. The velocity-level constraint is: φ̇ = J·v ≥ 0 (gap can't decrease below zero). This gives the LCP: find fn ≥ 0 such that φ̇ = J·M-1(fext + JTfn)·dt + J·v ≥ 0 and fn ⊥ φ̇.
Non-smoothness: The solution map fn(φ) is a step function: fn = max(0, k·(-φ)) for penalty, or fn ∈ NR+(φ) (normal cone) for constraint-based. Either way, it's not differentiable at φ=0. This means standard backpropagation through contact produces zero or undefined gradients. Differentiable simulators handle this via randomized smoothing, soft contacts, or subgradient methods.
The key insight: Contact is not just "hard" — it's mathematically discontinuous. The force jumps from 0 to a large value at the instant of contact. This is why explicit Euler + penalty methods need tiny timesteps (the force is a stiff spring), and why implicit/constraint-based methods can take larger steps (they solve for the consistent force directly).
Consider a simple spring-mass system: m·a = -k·x. With explicit Euler integration (xn+1 = xn + dt·vn, vn+1 = vn + dt·(-k/m)·xn):
Your task: (1) Show that this system is stable only when dt < 2√(m/k). (2) For a robot contact with k = 100,000 N/m and m = 1 kg, what's the max stable timestep? (3) Derive why semi-implicit Euler (symplectic) is unconditionally stable for this system.
Part 1: Explicit Euler matrix: det(A) = 1 + dt2k/m. Since det > 1, the system is volume-expanding in phase space — energy grows each step. For stability, need |λ| ≤ 1: dt2 ≤ 4m/k, so dt ≤ 2√(m/k). ■
Part 2: With k=100,000 N/m, m=1 kg: dt ≤ 2√(1/100000) = 2/316 = 0.0063s = 6.3ms. At 1kHz control (dt=1ms), you're stable. At 100Hz (dt=10ms), the simulation explodes. This is why contact simulation typically runs at 1-10kHz.
Part 3: Semi-implicit: vn+1 = vn - dt(k/m)xn, xn+1 = xn + dt·vn+1. Matrix A has det(A) = 1 always (regardless of dt). The eigenvalues have |λ| = 1, so the system is marginally stable. Energy oscillates slightly but never grows unboundedly. This is why MuJoCo uses semi-implicit Euler as its default integrator — it's stable for arbitrarily stiff contacts with reasonable timesteps.
The key insight: The stability limit scales as √(m/k). Stiff contacts (large k) require tiny timesteps with explicit methods. Semi-implicit methods remove this restriction by preserving the symplectic structure. This is THE reason professional physics engines use symplectic integrators, not Runge-Kutta.
Choosing the right simulator is a high-leverage decision. Each has a philosophy that shapes what it's good at:
MuJoCo (Multi-Joint dynamics with Contact) was created by Emo Todorov at the University of Washington. Its core insight: formulate contact as a convex optimization problem, which always has a unique solution and is fast to solve. MuJoCo runs at 10-50 million timesteps per second on a single CPU thread. It's the default for reinforcement learning research because you need millions of episodes, and speed is everything. Since DeepMind acquired it (now open-source), it has become the standard.
Isaac Sim (NVIDIA) is built on PhysX 5 and Omniverse. Its killer feature: GPU-accelerated parallel simulation. Instead of running one environment fast, run 4,096 environments simultaneously on a single GPU. Combined with photorealistic rendering via ray tracing, it's unbeatable for (1) massive RL training and (2) visual sim-to-real transfer where you need realistic images.
Drake (MIT Robot Locomotion Group) emphasizes mathematical rigor. Every algorithm has correctness guarantees. Its contact model (hydroelastic) produces smooth, physically meaningful contact patches. Drake includes optimization-based planners (SNOPT, IPOPT) and controllers (LQR, MPC) out of the box. Choose Drake when you need provably correct behavior — like designing a controller for a legged robot that must never fall.
PyBullet wraps the Bullet physics engine in Python. Simple API, good documentation, fast to prototype. But its contact model is less accurate than MuJoCo's, and it lacks GPU acceleration. Use PyBullet for quick experiments and educational prototypes, then graduate to MuJoCo or Isaac for serious work.
| Use Case | Best Choice | Why |
|---|---|---|
| RL training (speed) | MuJoCo / Isaac Lab | Need millions of episodes fast |
| RL training (visual) | Isaac Sim | Need photorealistic rendering + parallel envs |
| Manipulation research | Drake / MuJoCo | Need accurate contact for grasping |
| Visual perception | Isaac Sim | Need realistic cameras, lighting, materials |
| Provably safe control | Drake | Has formal verification tools |
| Quick prototype | PyBullet / MuJoCo | Simplest API, fastest to start |
| Deformable / soft objects | Isaac Sim (FEM) / SOFA | GPU-accelerated finite elements |
python import mujoco import numpy as np # Define a simple robot arm in MJCF (MuJoCo's XML format) XML = """ <mujoco> <option timestep="0.002" gravity="0 0 -9.81"/> <worldbody> <light pos="0 0 3"/> <geom type="plane" size="2 2 0.1"/> <body name="link1" pos="0 0 0.5"> <joint name="j1" type="hinge" axis="0 1 0"/> <geom type="capsule" size="0.04" fromto="0 0 0 0 0 0.4" rgba="1 .5 .3 1"/> <body name="link2" pos="0 0 0.4"> <joint name="j2" type="hinge" axis="0 1 0"/> <geom type="capsule" size="0.03" fromto="0 0 0 0 0 0.3" rgba=".3 .8 .7 1"/> </body> </body> </worldbody> <actuator> <motor joint="j1" ctrlrange="-2 2"/> <motor joint="j2" ctrlrange="-2 2"/> </actuator> </mujoco> """ model = mujoco.MjModel.from_xml_string(XML) data = mujoco.MjData(model) # Simulate 1000 steps for i in range(1000): data.ctrl[:] = [0.5, -0.3] # apply torques to joints mujoco.mj_step(model, data) if i % 100 == 0: print(f"t={data.time:.2f}s j1={data.qpos[0]:.3f}rad j2={data.qpos[1]:.3f}rad")
<joint damping="0.1"/> and actuator/position for more realistic behavior.--headless) to skip rendering entirely when training policies that only need proprioception. Switch from RTX rendering to simple rasterization if photorealism isn't needed.| Project | What's New | Status (2026) |
|---|---|---|
| MuJoCo 3.x | Composable scenes, better deformables, Python-first API | Stable, open-source |
| Isaac Lab | Successor to Isaac Gym, modular task framework | Production, NVIDIA-supported |
| Genesis | Generative sim — LLM-generated environments | Research preview |
| MuJoCo MJX | MuJoCo ported to JAX for GPU + autodiff | Stable, actively developed |
| SAPIEN | Articulated object manipulation, realistic rendering | Open-source |
Click each simulator card to see its strengths. Radar chart shows ratings across 5 dimensions.
A robot arm is a chain of links connected by joints. Each joint has an angle θ. Given all the joint angles, where is the end-effector (the tool at the tip)? That's forward kinematics (FK) — a straightforward chain of matrix multiplications. Given a desired end-effector position, what joint angles achieve it? That's inverse kinematics (IK) — a much harder problem because it's typically underdetermined (multiple solutions) or overconstrained (no solution).
For a 2-link planar arm (the simplest case), FK uses trigonometry:
The Jacobian J is the matrix that maps joint velocities to end-effector velocities: v = J · θ̇. It's computed by taking partial derivatives of the FK equations with respect to each joint angle. The Jacobian is the central object in robotics — it tells you how the end-effector moves when you wiggle each joint.
where s1 = sin(θ1), s12 = sin(θ1+θ2), etc.
IK via the Jacobian pseudoinverse: Δθ = J+ · Δx. This iteratively adjusts joint angles to move the end-effector toward the target. When the Jacobian becomes singular (loses rank), IK fails. This happens at singularities — configurations where the arm is fully extended or folded. The fix is damped least squares: Δθ = JT(JJT + λ2I)-1 · Δx. The damping λ prevents the solution from blowing up near singularities.
Robot control operates in three coordinate spaces, forming a hierarchy:
A 7-DOF arm (like the Franka Panda) solving a 6-DOF task (3D position + 3D orientation) has one extra degree of freedom. This redundancy creates a null space — a set of joint motions that don't change the end-effector pose. You can use this null space to optimize a secondary objective: avoid joint limits, keep the elbow away from obstacles, or minimize energy. The null space projector is N = I - J+J.
python import numpy as np def forward_kinematics(theta1, theta2, L1=1.0, L2=0.8): """FK: joint angles -> end-effector position.""" x = L1 * np.cos(theta1) + L2 * np.cos(theta1 + theta2) y = L1 * np.sin(theta1) + L2 * np.sin(theta1 + theta2) return np.array([x, y]) def jacobian(theta1, theta2, L1=1.0, L2=0.8): """Jacobian: d(x,y)/d(theta1, theta2).""" s1, c1 = np.sin(theta1), np.cos(theta1) s12, c12 = np.sin(theta1+theta2), np.cos(theta1+theta2) return np.array([ [-L1*s1 - L2*s12, -L2*s12], [ L1*c1 + L2*c12, L2*c12] ]) def ik_damped(target, theta_init, L1=1.0, L2=0.8, lam=0.1, steps=100): """Damped least-squares IK. Converges even near singularities.""" theta = np.array(theta_init, dtype='float64') for _ in range(steps): pos = forward_kinematics(theta[0], theta[1], L1, L2) error = target - pos if np.linalg.norm(error) < 1e-4: break J = jacobian(theta[0], theta[1], L1, L2) # Damped pseudoinverse: J^T (J J^T + lambda^2 I)^-1 JJT = J @ J.T + lam**2 * np.eye(2) dtheta = J.T @ np.linalg.solve(JJT, error) theta += dtheta * 0.5 # step size for stability return theta # Example: reach target at (1.2, 0.5) target = np.array([1.2, 0.5]) angles = ik_damped(target, [0.5, 0.5]) result = forward_kinematics(angles[0], angles[1]) print(f"Target: {target}, Reached: {result}, Error: {np.linalg.norm(target-result):.6f}")
IKFlow uses normalizing flows to learn the full IK solution manifold — given a target, it samples from ALL valid joint configurations, not just one. This is useful for motion planning where you need diverse IK solutions to find collision-free paths. Differentiable FK enables end-to-end learning: backpropagate through the kinematic chain to optimize grasp poses jointly with control parameters.
Click anywhere to set a target (yellow dot). The arm uses damped least-squares IK to reach it. Red glow near full extension = singularity. Watch the Jacobian condition number.
The robot arm lives in configuration space (C-space): the space of all possible joint angles. A 6-DOF arm has a 6-dimensional C-space. An obstacle in the 3D workspace becomes a complex, high-dimensional forbidden region in C-space. Motion planning finds a path through C-space that avoids these forbidden regions.
Why not just plan in workspace (3D)? Because the arm has width. Two workspace paths that look identical may differ dramatically in C-space — one requires the elbow to pass through a shelf, the other doesn't. C-space captures the full geometry of the arm, not just its tip.
RRT (Rapidly-exploring Random Trees) is the workhorse of robotics planning. It grows a tree from the start configuration by repeatedly: (1) sample a random configuration, (2) find the nearest node in the tree, (3) extend toward the sample by a step size, (4) check the extension for collisions, (5) if collision-free, add the new node. RRT is probabilistically complete — given enough samples, it will find a path if one exists.
RRT-Connect (bidirectional RRT) grows trees from BOTH the start and goal. When the trees meet, you have a path. This is dramatically faster for narrow-passage problems because both trees are exploring simultaneously.
Optimization-based planners (CHOMP, TrajOpt, STOMP) start with an initial trajectory (e.g., straight line in C-space) and iteratively deform it to avoid obstacles while minimizing smoothness/energy cost. These produce smoother paths than RRT but can get stuck in local minima.
Typical pipeline: plan a coarse path with RRT (~100 waypoints, 200ms), smooth it with trajectory optimization (~20 waypoints, 50ms), then track it with a PD controller at 1kHz. The planner doesn't need to produce a perfect trajectory — just a collision-free waypoint sequence. The optimizer handles smoothness.
python import numpy as np class RRT: def __init__(self, start, goal, bounds, obstacles, step=0.3): self.start = np.array(start) self.goal = np.array(goal) self.bounds = bounds # [(lo, hi), (lo, hi)] self.obstacles = obstacles # list of (center, radius) self.step = step self.nodes = [self.start.copy()] self.parents = [-1] def sample(self): if np.random.rand() < 0.1: # 10% goal bias return self.goal.copy() return np.array([np.random.uniform(lo, hi) for lo, hi in self.bounds]) def nearest(self, point): dists = [np.linalg.norm(n - point) for n in self.nodes] return np.argmin(dists) def steer(self, from_pt, to_pt): diff = to_pt - from_pt dist = np.linalg.norm(diff) if dist < self.step: return to_pt.copy() return from_pt + diff / dist * self.step def collision_free(self, p1, p2, n_checks=10): for t in np.linspace(0, 1, n_checks): pt = p1 + t * (p2 - p1) for center, radius in self.obstacles: if np.linalg.norm(pt - center) < radius: return False return True def plan(self, max_iter=2000): for _ in range(max_iter): rand = self.sample() near_idx = self.nearest(rand) new = self.steer(self.nodes[near_idx], rand) if self.collision_free(self.nodes[near_idx], new): self.nodes.append(new) self.parents.append(near_idx) if np.linalg.norm(new - self.goal) < self.step: return self._trace(len(self.nodes)-1) return None # failed def _trace(self, idx): path = [] while idx != -1: path.append(self.nodes[idx]) idx = self.parents[idx] return path[::-1]
Consider a 2D workspace with a narrow passage (10% of workspace width). How many samples does each algorithm need?
| Algorithm | Passage Width | Avg Samples to Find Path | Avg Path Length |
|---|---|---|---|
| RRT | Wide (50%) | ~80 | 1.4x optimal |
| RRT | Narrow (10%) | ~2,500 | 1.8x optimal |
| RRT | Very narrow (2%) | ~60,000 | 2.5x optimal |
| RRT-Connect | Wide (50%) | ~30 | 1.3x optimal |
| RRT-Connect | Narrow (10%) | ~400 | 1.5x optimal |
| RRT-Connect | Very narrow (2%) | ~8,000 | 2.0x optimal |
RRT-Connect is 3-8x faster in narrow passages because both trees grow toward the gap from opposite sides, effectively halving the passage length each tree must discover. This is why bidirectional search is the default in production systems (OMPL defaults to RRT-Connect).
After finding a path, shortcutting tries connecting non-adjacent waypoints. For each pair (i, j) where j > i+1, check if the direct connection is collision-free. If so, remove all intermediate waypoints. This typically reduces path length by 30-50% and removes the characteristic "zigzag" of RRT paths.
Diffusion Policy treats planning as denoising: start with a random trajectory, iteratively denoise it into a collision-free, task-completing path. It naturally handles multi-modality (multiple valid plans). MPNet learns a sampling distribution from successful plans, making RRT find paths faster in familiar environments. STORM uses parallel trajectory optimization on GPU to evaluate thousands of candidate trajectories simultaneously.
Click to place start (green), goal (yellow), and obstacles (red). Press "Plan" to watch RRT grow. Toggle RRT vs RRT-Connect.
PID control is the foundation. The controller computes a torque based on three error terms: Proportional (how far from target), Integral (accumulated past error), Derivative (how fast the error is changing):
P alone gives fast response but overshoots. D damps the oscillation. I eliminates steady-state error (e.g., gravity pulling the arm down). The art is tuning: too much P = oscillation, too much D = sluggish, too much I = windup and overshoot on step changes.
Impedance control makes the robot behave like a mass-spring-damper system in task space. Instead of commanding a position, you command a dynamic relationship between force and displacement:
where Md, Bd, Kd are the desired inertia, damping, and stiffness. Low stiffness = compliant (good for contact tasks like wiping a table). High stiffness = stiff (good for precise positioning). This is critical for safe human-robot interaction — a stiff controller will hurt people, a compliant one will give way.
Computed torque (inverse dynamics) uses the robot's dynamic model to cancel out nonlinear effects (gravity, Coriolis, centrifugal forces), then applies a simple linear controller on top:
This gives perfect tracking IF the model (M, C, g) is perfect. In practice, the model is always wrong — which is why robust and adaptive control exist.
| Regime | Controller | Rate | Use Case |
|---|---|---|---|
| Free space | Position control (PD + gravity comp) | 1kHz | Moving between waypoints |
| Pre-contact | Impedance (low stiffness) | 1kHz | Approaching a surface safely |
| Contact | Force/impedance control | 1kHz | Pushing, wiping, insertion |
| Hybrid | Position in some axes, force in others | 1kHz | Sliding along a surface (position normal, force tangent) |
python class PIDController: def __init__(self, kp, ki, kd, max_integral=10.0, max_output=50.0): self.kp, self.ki, self.kd = kp, ki, kd self.max_integral = max_integral # anti-windup clamp self.max_output = max_output self.integral = 0.0 self.prev_error = 0.0 def compute(self, error, dt): # Proportional P = self.kp * error # Integral with anti-windup self.integral += error * dt self.integral = np.clip(self.integral, -self.max_integral, self.max_integral) I = self.ki * self.integral # Derivative (on error, not on measurement to avoid setpoint kick) D = self.kd * (error - self.prev_error) / dt if dt > 0 else 0 self.prev_error = error output = P + I + D return np.clip(output, -self.max_output, self.max_output) # Simulate a 1-DOF joint with PID pid = PIDController(kp=20, ki=5, kd=8) theta, omega = 0.0, 0.0 # current angle, angular velocity target = 1.0 # desired angle (radians) inertia = 0.5 # kg*m^2 dt = 0.001 # 1kHz for i in range(3000): # 3 seconds error = target - theta torque = pid.compute(error, dt) alpha = torque / inertia - 0.5 * omega # + viscous friction omega += alpha * dt theta += omega * dt
A common tuning procedure (Ziegler-Nichols): (1) Set Ki = 0, Kd = 0. (2) Increase Kp until the system oscillates at a constant amplitude — this is the ultimate gain Ku. (3) Measure the oscillation period Tu. (4) Set gains according to the table:
| Controller | Kp | Ki | Kd |
|---|---|---|---|
| P only | 0.5 Ku | 0 | 0 |
| PI | 0.45 Ku | 0.54 Ku/Tu | 0 |
| PID | 0.6 Ku | 1.2 Ku/Tu | 0.075 Ku Tu |
For our simulated joint (inertia = 0.5 kg m2, friction = 0.5): Ku ≈ 45, Tu ≈ 0.45s. Ziegler-Nichols PID: Kp = 27, Ki = 120, Kd = 1.5. This gives aggressive response with ~15% overshoot. For robotics, we usually want less overshoot (safer), so use 70% of Kp and 150% of Kd: Kp = 19, Ki = 120, Kd = 2.3.
Residual policy learning combines a classical controller with a learned correction: τ = τPID + πθ(state). The classical controller handles the known dynamics, and the neural network learns to correct for model mismatch and unmodeled effects. This converges much faster than learning from scratch because the residual is small. Adaptive control estimating parameters online is being replaced by learned adaptation — a meta-learned policy that adapts to new dynamics in a few timesteps.
A 1-DOF joint tracking a step input. Adjust P, I, D gains and watch the step response. The target line is dashed. Observe overshoot, settling time, and steady-state error.
A robot needs to know what's in front of it and where things are in 3D. Depth cameras provide this, but each type has different tradeoffs:
| Sensor | How It Works | Range | Weakness |
|---|---|---|---|
| Stereo | Two cameras, triangulation | 0.5-10m | Fails on textureless surfaces |
| Structured light | Project pattern, measure distortion | 0.3-5m | Fails in sunlight |
| Time-of-flight | Measure light round-trip time | 0.5-10m | Multipath interference |
| LiDAR | Pulsed laser scanning | 1-200m | Expensive, sparse |
For tabletop manipulation (the Apple SPG use case), structured light (Intel RealSense D435) or stereo gives dense depth at 30fps with sub-millimeter accuracy at close range.
The perception pipeline for manipulation:
Latency budget: the entire pipeline must run in under 100ms to keep up with the control loop. Perception at 30Hz, planning at 10Hz, control at 1kHz.
python import open3d as o3d import numpy as np # 1. Load point cloud from depth image pcd = o3d.io.read_point_cloud("scene.ply") # 2. Downsample with voxel grid (0.005m = 5mm voxels) pcd = pcd.voxel_down_sample(0.005) # 3. Remove table plane with RANSAC plane, inliers = pcd.segment_plane(distance_threshold=0.01, ransac_n=3, num_iterations=1000) objects = pcd.select_by_index(inliers, invert=True) # 4. Cluster objects with DBSCAN labels = np.array(objects.cluster_dbscan(eps=0.02, min_points=50)) n_clusters = labels.max() + 1 print(f"Found {n_clusters} objects") # 5. Get bounding box for each cluster for i in range(n_clusters): cluster = objects.select_by_index(np.where(labels == i)[0]) bbox = cluster.get_axis_aligned_bounding_box() center = bbox.get_center() extent = bbox.get_extent() print(f"Object {i}: center={center}, size={extent}")
min_points isn't too high — a small object at 1m has very few depth pixels.SAM2 (Segment Anything Model 2) gives zero-shot segmentation — point at any object and get a perfect mask, no training needed. FoundationPose estimates 6-DOF pose from a single reference image, no CAD model required. SpatialVLM grounds language in 3D space: "pick up the red cup on the left" becomes a 3D coordinate. These foundation models are replacing the hand-crafted perception pipelines above with general-purpose perception that works out of the box.
A tabletop scene viewed from above. Points are colored by cluster after RANSAC plane removal and DBSCAN segmentation. Adjust voxel size to see resolution vs speed tradeoff.
This is THE critical skill. Most robot learning happens in simulation. Getting it to work on real hardware is where careers are made. The reality gap — the difference between simulated and real physics, perception, and dynamics — determines whether your policy transfers or falls flat.
| Gap Type | What's Different | Impact | Fix |
|---|---|---|---|
| Dynamics | Friction, inertia, joint damping, backlash | Wrong forces → wrong trajectories | System identification + randomization |
| Perception | Rendering vs real images, lighting, textures | Policy confused by visual differences | Visual domain randomization |
| Actuator | Communication delays (5-20ms), backlash, saturation | Commands arrive late → oscillation/instability | Add latency in sim, randomize delay |
| Sensor | Noise, bias, dropout, different FoV | Overconfident policy → brittle behavior | Add noise in sim |
python import numpy as np import mujoco class DomainRandomizer: """Randomize simulation parameters each episode for robust transfer.""" def __init__(self, model): self.model = model # Store default values self.default_friction = model.geom_friction.copy() self.default_mass = model.body_mass.copy() self.default_damping = model.dof_damping.copy() def randomize(self): # Friction: multiply by uniform [0.5, 1.5] scale = np.random.uniform(0.5, 1.5, self.default_friction.shape) self.model.geom_friction[:] = self.default_friction * scale # Mass: +/- 20% scale = np.random.uniform(0.8, 1.2, self.default_mass.shape) self.model.body_mass[:] = self.default_mass * scale # Joint damping: multiply by [0.5, 2.0] scale = np.random.uniform(0.5, 2.0, self.default_damping.shape) self.model.dof_damping[:] = self.default_damping * scale # Actuator delay: 0-20ms (added in step function) self.action_delay = np.random.randint(0, 21) # ms # Observation noise: Gaussian with std 0-0.02 rad self.obs_noise_std = np.random.uniform(0, 0.02) # Recompute derived quantities after changing parameters mujoco.mj_setConst(self.model, mujoco.MjData(self.model)) def add_obs_noise(self, obs): return obs + np.random.randn(*obs.shape) * self.obs_noise_std # Usage: call randomizer.randomize() at each episode reset # The policy trained over thousands of randomized envs # learns to be robust to parameter variation
You trained a reaching policy in MuJoCo with 95% success. On the real Franka Panda, it achieves 35%. Here's the systematic debugging trace:
| Step | Action | Finding | Fix |
|---|---|---|---|
| 1 | Record real joint trajectories at 1kHz | Joint 4 lags by 15ms compared to sim replay | Add 15ms delay to sim actuator model |
| 2 | Compare contact forces during grasp | Real friction ~40% lower than sim default | System ID: set μ=0.6 (was 1.0), add DR ±30% |
| 3 | Check joint tracking RMSE over trajectory | Joints 5-6 (wrist) have 2x error of joints 1-3 | Wrist joint damping was wrong. Recalibrate from free-motion data. |
| 4 | Re-train with improved sim + wider DR | Real success: 78% | Remaining gap: observation noise on joint velocities |
| 5 | Add velocity observation noise (std=0.01 rad/s) | Real success: 87% | Ship it. Remaining 13% is edge cases (unusual object poses). |
The key insight: each fix addressed a specific measured gap, not a generic "add more randomization." System identification (steps 1-3) moved the sim center closer to reality. Domain randomization (steps 4-5) covered the remaining uncertainty. This ordered approach converges in 3-5 iterations, while blind DR tuning can take 20+ iterations.
Auto-DR (Automatic Domain Randomization) uses a secondary RL agent to learn the randomization ranges that maximize transfer. Instead of hand-tuning "friction ∈ [0.5, 1.5]", the system discovers the right ranges automatically. Real-to-Sim-to-Real scans the real environment (with NeRF or 3D Gaussian Splatting), builds a digital twin, trains a policy in the twin, and deploys. Foundation models (pre-trained visual features like DINOv2) transfer better than raw pixels because they've already learned to be invariant to visual domain shifts.
Left: simulated arm reaching a target. Right: "real" arm (same task + noise/delay). Adjust domain randomization range and reality gap. Watch success rates change. Crank up the gap, then crank up DR to recover.
In simulation, we often add "shaping rewards" to help RL learn faster: reward for moving toward the goal, penalty for collisions, bonus for grasping. But naive reward shaping can change the optimal policy!
Your task: (1) Prove that potential-based reward shaping F(s, s') = γΦ(s') − Φ(s) preserves the optimal policy for any potential function Φ. (2) Give an example of non-potential-based shaping that creates a suboptimal fixed point. (3) For a reaching task, design a good shaping potential Φ(s).
Proof that PBRS preserves optimality:
Define Q'(s,a) = E[∑ γt(Rt + Ft) | s0=s, a0=a]. Expanding F = γΦ(s') - Φ(s):
Q'(s,a) = Q(s,a) + E[∑ γt(γΦ(st+1) - Φ(st))] = Q(s,a) - Φ(s) + γ∞Φ(s∞)
The term -Φ(s) is independent of action a. So argmaxa Q'(s,a) = argmaxa Q(s,a). ■
Good reaching potential: Φ(s) = -c · ||gripper_pos - target_pos||. This gives dense reward for approaching the target. It's potential-based (function of state only), so it provably doesn't change the optimal policy, but it provides strong learning signal.
The key insight: In sim, you have full access to state, so you can compute any Φ(s). Use potential-based shaping aggressively: distance-to-goal, joint-limit proximity, orientation alignment. These accelerate learning 10-100x without changing what the agent ultimately learns. Non-potential shaping (alive bonuses, step penalties) is dangerous because it DOES change the optimal policy.
OpenAI's approach (2019 Rubik's Cube):
Engine: MuJoCo (not GPU-parallelized at the time), but running 6144 parallel CPU workers. Total: ~13,000 CPU years of experience in months of wall-clock time. Today: MuJoCo MJX on GPU gives 10,000+ envs on one A100.
Observations: Proprioception (joint angles + fingertip forces) for the actor. The critic sees privileged info (true object pose). This "asymmetric actor-critic" lets the critic learn fast while the actor only uses transferable signals.
DR Strategy: Automatic Domain Randomization (ADR). Start narrow, widen each parameter until performance drops below threshold, then hold. 37 randomized parameters including: friction (×0.5-2.0), mass (×0.5-2.0), joint damping (×0.3-3.0), actuator PD gains (×0.7-1.3), observation noise (0-5%), action delay (0-2 steps), gravity (±0.5 m/s2).
Vision: Separate pose estimation network (not end-to-end). Real camera → object pose estimate → feed to policy. This isolates the vision sim-to-real gap from the control sim-to-real gap.
Curriculum: Face rotation curriculum. Start by randomizing only the initial face states needed (1 rotation from solved). Gradually increase to full scrambles. Without curriculum: the policy never discovers a solve from distant states.
Key insight: The MOST important decision was separating vision and control. Trying to learn end-to-end (pixels to joint torques) in sim would require photorealistic rendering + massive domain randomization on visuals. By using a separate pose estimator, they could train the manipulation policy on clean state (proprioception + estimated pose), dramatically reducing the sim-to-real gap.
python import numpy as np from multiprocessing import Process, Pipe class VectorizedEnv: def __init__(self, env_fn, num_envs): self.num_envs = num_envs self.envs = [env_fn() for _ in range(num_envs)] # Get obs/action dims from first env self.obs_dim = self.envs[0].observation_space.shape[0] def reset(self): obs = np.stack([env.reset()[0] for env in self.envs]) return obs # (num_envs, obs_dim) def step(self, actions): obs_list, rew_list, done_list, info_list = [], [], [], [] for i, (env, act) in enumerate(zip(self.envs, actions)): obs, rew, terminated, truncated, info = env.step(act) done = terminated or truncated if done: # Store terminal obs BEFORE reset info['terminal_obs'] = obs.copy() obs = self._auto_reset(i) obs_list.append(obs) rew_list.append(rew) done_list.append(done) info_list.append(info) return (np.stack(obs_list), np.array(rew_list), np.array(done_list), info_list) def _auto_reset(self, idx): obs, _ = self.envs[idx].reset() return obs
A physics simulator and a learned world model are the SAME abstraction: both predict s' = f(s, a). The difference is origin (hand-crafted vs learned) and accuracy (exact for sim physics vs approximate for real physics). MBRL's key insight: when you can't build an accurate simulator (because the real dynamics are unknown), learn one from data. The simulator lesson teaches you what a "good model" looks like; the MBRL lesson teaches you how to learn one.
If MuJoCo is already a world model, why bother learning a neural network model? Under what conditions does a learned model outperform a physics engine?
Sim-to-real transfer is a special case of domain adaptation. The "domain" is the physics parameters. Domain randomization is equivalent to training on multiple source domains simultaneously — the same principle behind data augmentation in vision (random crops, color jitter = visual domain randomization). The insight transfers: any method that makes models robust to distribution shift (adversarial training, meta-learning, domain-invariant features) applies to sim-to-real.
In computer vision, ImageNet pretraining transfers to medical images despite the huge domain gap. Could a "PhysicsNet" (pretrained on diverse simulation data) transfer to novel robots the same way?
System ID wins when: you have access to the real robot and can collect calibration data. If you know friction is wrong, measuring it and setting it correctly is strictly better than randomizing over a wide range. A centered, narrow randomization (sys-ID + small DR) produces better policies than a wide, uncentered one. Example: calibrating joint damping coefficients by recording free-motion trajectories.
DR wins when: you CAN'T know the true parameters (novel objects with unknown mass, varying environmental conditions, or you simply don't have real robot access yet). Also for parameters that genuinely vary in deployment (different objects, different table surfaces, lighting changes). Example: training a bin-picking policy that must handle objects from 0.1kg to 2.0kg.
The synthesis: Best practice is BOTH: system ID to center your randomization at reality, then DR to cover residual uncertainty. This gives the narrowest effective randomization (highest policy performance) while still ensuring transfer.
Reinforcement learning (RL) learns from trial and error. The robot tries random actions, receives a reward signal, and gradually learns which actions lead to high rewards. For robotics, RL happens in simulation (millions of episodes, each taking seconds) because real robots are too slow and expensive. The dominant algorithm is PPO (Proximal Policy Optimization) — stable, scalable, and well-understood.
Imitation learning learns from demonstrations. A human teleoperates the robot (or records hand motions), and the policy learns to map observations to actions that mimic the expert. Behavior cloning (BC) is the simplest form: supervised learning on (observation, action) pairs. The problem: compounding errors. A small deviation from the demonstration puts the robot in a state it never saw during training, leading to further deviation. DAgger fixes this by interactively querying the expert in states the policy visits.
Diffusion policies model the action distribution as a denoising diffusion process. Starting from Gaussian noise, the policy iteratively denoises to produce an action trajectory. The key advantage: multi-modality. If there are multiple valid ways to fold a shirt (left-first or right-first), BC averages them (producing garbage), while diffusion policies sample from one mode or the other.
Action chunking predicts N future actions at once (typically 8-32 timesteps). This reduces compounding errors in BC and provides temporal consistency in diffusion policies. ACT (Action Chunking with Transformers) combines a transformer encoder with a CVAE decoder to produce action chunks.
| Method | When to Use | Data Need | Strength | Weakness |
|---|---|---|---|---|
| RL (PPO) | Good sim + clear reward | None (self-play) | Discovers novel strategies | Reward engineering, sample inefficient |
| BC | Expert demos available, unimodal task | 50-200 demos | Simple, fast training | Compounding errors, mode averaging |
| DAgger | Expert available for interactive queries | Interactive | Fixes compounding errors | Requires online expert |
| Diffusion | Multi-modal tasks, high precision | 50-200 demos | Multi-modality, precision | Slow inference (10-50 denoising steps) |
| ACT | Bimanual, contact-rich tasks | 50-100 demos | Temporal consistency | Fixed chunk size limits reactivity |
Designing good reward functions is the hardest part of RL for robotics. A sparse reward (+1 when the task succeeds, 0 otherwise) is easy to define but almost impossible to learn from — the robot must randomly stumble into success before it can learn anything. A dense reward provides signal at every timestep but can lead to reward hacking (the robot finds unexpected shortcuts that maximize reward without actually completing the task).
The practical approach is reward shaping — adding intermediate rewards that guide the robot toward the goal without specifying exactly how to get there:
python def reaching_reward(obs, goal): dist = np.linalg.norm(obs["ee_pos"] - goal) # Dense: negative distance (closer = better) r = -dist # Bonus for getting close (but not for staying close) if dist < 0.05: r += 1.0 # Penalty for high velocity at contact (don't slam into things) if dist < 0.1: r -= 0.1 * np.linalg.norm(obs["ee_vel"]) # Penalty for joint limits (stay in safe range) r -= 0.01 * np.sum(np.maximum(0, np.abs(obs["q"]) - 0.9 * obs["q_lim"])) return r
Common reward engineering pitfalls: (1) Reward for being close to the goal instead of reaching it → the robot hovers near the goal forever. (2) No penalty for collisions → the robot takes the shortest path through obstacles. (3) Reward for grasping force → the robot crushes delicate objects. (4) No time penalty → the robot moves infinitely slowly to maximize precision.
python import torch import torch.nn as nn import numpy as np class Policy(nn.Module): def __init__(self, obs_dim, act_dim): super().__init__() self.net = nn.Sequential( nn.Linear(obs_dim, 64), nn.Tanh(), nn.Linear(64, 64), nn.Tanh()) self.mean = nn.Linear(64, act_dim) self.log_std = nn.Parameter(torch.zeros(act_dim)) def forward(self, obs): h = self.net(obs) mean = self.mean(h) std = self.log_std.exp() return torch.distributions.Normal(mean, std) def ppo_update(policy, optimizer, states, actions, advantages, old_log_probs, clip_eps=0.2, epochs=4): for _ in range(epochs): dist = policy(states) log_probs = dist.log_prob(actions).sum(-1) ratio = (log_probs - old_log_probs).exp() # PPO clipped objective surr1 = ratio * advantages surr2 = ratio.clamp(1 - clip_eps, 1 + clip_eps) * advantages loss = -torch.min(surr1, surr2).mean() optimizer.zero_grad(); loss.backward(); optimizer.step() # Reward function for reaching: -distance to target def reward(end_effector_pos, target_pos): dist = np.linalg.norm(end_effector_pos - target_pos) if dist < 0.02: return 10.0 # bonus for reaching target return -dist # shaped reward: closer = better
RT-2 (Robotic Transformer 2) is a VLA (Vision-Language-Action) model: a 55B parameter model that takes images and language instructions and outputs robot actions. Trained on internet-scale data + robot demonstrations. pi-0 uses flow matching (continuous-time analog of diffusion) for action generation, achieving state-of-the-art dexterous manipulation. Octo is an open-source generalist policy pre-trained on the Open X-Embodiment dataset (1M+ robot episodes). LeRobot (Hugging Face) provides a unified framework for training and deploying robot learning algorithms.
A 2D point agent learns to reach a target. Watch the policy improve over episodes. Toggle reward shaping vs sparse reward to see the difference. The reward curve shows learning progress.
ROS2 (Robot Operating System 2) is the middleware that connects all robot software components. It's not an operating system — it's a communication framework. Each component runs as a node (a process) that communicates via topics (pub/sub), services (request/reply), or actions (long-running tasks with feedback).
Key concepts:
| Concept | Pattern | Use Case | Latency |
|---|---|---|---|
| Topic | Pub/Sub (N-to-N) | Sensor streams, state broadcasts | ~1ms (intra-process) |
| Service | Request/Reply (1-to-1) | "Plan a path", "Get parameter" | ~5ms |
| Action | Goal + Feedback + Result | "Move to pose" (takes seconds) | Varies |
| Parameter | Key-value store per node | Runtime configuration | Immediate |
QoS (Quality of Service) profiles determine message delivery guarantees. For sensor data: BEST_EFFORT (drop old messages if subscriber is slow). For control commands: RELIABLE (guarantee delivery). For safety signals: RELIABLE + TRANSIENT_LOCAL (new subscribers get last message). Getting QoS wrong is one of the most common causes of mysterious ROS2 bugs.
The 1kHz control loop runs on a real-time OS (PREEMPT_RT Linux or Xenomai) in a dedicated thread. It communicates with ROS2 via lock-free shared memory. The ROS2 nodes run in a separate executor. This separation is critical: ROS2 is NOT real-time by default. A garbage collection pause in a Python node must never affect the safety-critical control loop.
cpp // ROS2 lifecycle node: subscribes to joint states, publishes torques #include <rclcpp/rclcpp.hpp> #include <sensor_msgs/msg/joint_state.hpp> #include <std_msgs/msg/float64_multi_array.hpp> class ArmController : public rclcpp::Node { public: ArmController() : Node("arm_controller") { // Subscribe to joint states (RELIABLE for control) auto qos = rclcpp::QoS(10).reliable(); sub_ = create_subscription<sensor_msgs::msg::JointState>( "joint_states", qos, [this](const sensor_msgs::msg::JointState::SharedPtr msg) { // PD control at 1kHz std::vector<double> torques(msg->position.size()); for (size_t i = 0; i < torques.size(); i++) { double err = target_[i] - msg->position[i]; double derr = -msg->velocity[i]; torques[i] = kp_ * err + kd_ * derr; } auto cmd = std_msgs::msg::Float64MultiArray(); cmd.data = torques; pub_->publish(cmd); }); pub_ = create_publisher<std_msgs::msg::Float64MultiArray>( "torque_commands", qos); } private: double kp_ = 20.0, kd_ = 5.0; std::vector<double> target_ = {0, 0.5, -0.3, 0, 0, 0}; // subscribers and publishers rclcpp::Subscription<sensor_msgs::msg::JointState>::SharedPtr sub_; rclcpp::Publisher<std_msgs::msg::Float64MultiArray>::SharedPtr pub_; };
ros2 topic info -v /topic_name to see QoS settings of all publishers and subscribers. Also check DDS middleware — CycloneDDS vs FastRTPS have different default behaviors.ros2 run [pkg] [node] --ros-args -p use_sim_time:=false and measure callback execution time. If callbacks take >1ms, you're missing the 1kHz deadline. Solutions: move to a real-time executor, reduce callback complexity, or use a separate real-time thread outside ROS2.python # launch/manipulation.launch.py from launch import LaunchDescription from launch_ros.actions import Node def generate_launch_description(): return LaunchDescription([ # Camera driver (30Hz, BEST_EFFORT) Node(package="realsense2_camera", executable="realsense2_camera_node", parameters=[{"depth_module.profile": "640x480x30"}]), # Perception (subscribes to camera, publishes object_poses) Node(package="perception", executable="pose_estimator", parameters=[{"model": "foundation_pose", "gpu": 0}]), # Motion planner (action server, RELIABLE) Node(package="planning", executable="motion_planner", parameters=[{"algorithm": "rrt_connect", "timeout_ms": 500}]), # Controller (1kHz real-time, RELIABLE) Node(package="control", executable="impedance_controller", parameters=[{"rate": 1000, "stiffness": 1000.0}]), # Safety monitor (1kHz, RELIABLE + TRANSIENT_LOCAL) Node(package="safety", executable="force_monitor", parameters=[{"max_force_N": 30.0, "rate": 1000}]), ])
Key design decisions in the launch file: perception and planning are separate processes (crash isolation — if perception crashes, the controller still holds position). The safety monitor runs at 1kHz with TRANSIENT_LOCAL QoS so that new subscribers (like a UI) immediately get the last safety status. All parameters are exposed as ROS2 parameters for runtime tuning without recompilation.
VLA models (Vision-Language-Action) like RT-2 and pi-0 are collapsing the traditional perception→planning→control pipeline into a single neural network. Instead of separate nodes for each, one model takes images and language instructions and outputs motor commands directly. This changes the ROS2 architecture: fewer nodes, simpler data flow, but new challenges around real-time inference (running a 7B model at 10Hz requires an onboard GPU).
Click a node to highlight its data flow. Yellow = topics, blue = services. Thickness shows message rate. The controller node (red border) has a real-time constraint.
This is the capstone. An end-to-end robotics pipeline running in your browser: perception detects objects, the planner finds a path, the controller executes it, and the physics engine simulates the result. You control every stage and can break any part to see how failures cascade.
The simulation starts with reasonable defaults that produce successful reaching. Your job is to understand each failure mode:
| What to Break | How | What Happens |
|---|---|---|
| Perception | Crank noise to max | Target estimate jumps around, arm oscillates |
| Planning | Reduce step size to minimum | Planner takes too long, arm waits |
| Control | Set P very high, D to zero | Arm overshoots wildly and oscillates |
| Physics | Set timestep to maximum | Simulation becomes unstable, energy grows |
| Everything | All sliders to extremes | Complete system failure — now fix it |
A 2-link arm reaches for targets. Control every stage of the pipeline. Break it, then fix it. Metrics update in real time.
| Concept | 30-sec Explanation | Key Equation | Tool | Classic Paper | Modern Paper |
|---|---|---|---|---|---|
| Rigid body dynamics | F=ma + τ=Iα with quaternion orientation. Semi-implicit Euler for stability. | F=ma, τ=Iα+ω×(Iω) | MuJoCo | Featherstone 2008 | Brax (Freeman 2021) |
| Contact resolution | Detect collision (AABB+GJK), resolve with constraint-based LCP or penalty forces. | fn≥0, d≥0, fn·d=0 | MuJoCo, Drake | Stewart & Trinkle 1996 | Dojo (Howell 2022) |
| FK / IK | FK: joint angles→end-effector via DH chain. IK: inverse via Jacobian pseudoinverse. | x=FK(θ), Δθ=J+Δx | KDL, Pinocchio | Craig 1989 | IKFlow (Ames 2022) |
| Motion planning | Find collision-free path in C-space. RRT samples randomly, grows tree toward goal. | P(find|∃) → 1 as n→∞ | OMPL, Drake | LaValle 1998 (RRT) | Diffusion Policy (Chi 2023) |
| PID / Impedance | PID: proportional+integral+derivative error correction. Impedance: virtual spring-damper. | τ=Kpe+Ki∫e+Kdė | ros2_control | Hogan 1985 (impedance) | Residual RL (Johannink 2019) |
| Perception | Depth → point cloud → segmentation → 6-DOF pose → grasp candidates. | p = K-1[u,v,1]T·d | Open3D, SAM2 | PoseCNN (Xiang 2018) | FoundationPose (Wen 2024) |
| Sim-to-real | Train in sim, transfer to real. Bridge reality gap via domain randomization + sys-ID. | Randomize φ ~ U(lo, hi) | Isaac Lab | Tobin et al. 2017 (DR) | Auto-DR (Akkaya 2019) |
| RL / BC | RL: learn from reward in sim. BC: learn from demos. Diffusion: multi-modal action distributions. | L = -E[A·logπ] | LeRobot, Stable Baselines3 | PPO (Schulman 2017) | pi-0 (Black 2024) |
model.body_mass, call mj_setConst() to recompute derived inertia and gravity compensation terms. Forgetting this causes subtle bugs where the modified mass is only partially applied.
| Component | Classical Approach | Learned Approach | When to Switch |
|---|---|---|---|
| Planning | RRT, PRM, A* — complete, reliable, well-understood | Diffusion Policy, MPNet — fast, handles multi-modality | When planning time >100ms or task is multi-modal |
| Control | PID, impedance, computed torque — stable, analyzable | RL policy, residual learning — handles unknown dynamics | When model is uncertain or task is too complex to hand-code |
| Perception | RANSAC, ICP, feature matching — interpretable, no training data | FoundationPose, SAM2 — zero-shot, handles novel objects | When objects are novel or scenes are cluttered |
| Contact | LCP, penalty, hydroelastic — physics-based, differentiable | Learned contact models — fast, match real data | When real contact is too complex to model analytically |
| Sim-to-real | System ID + manual DR — interpretable, controllable | Auto-DR, real-to-sim — automatic, scalable | When manual tuning doesn't close the gap |
Book: Robotics: Modelling, Planning and Control by Siciliano, Sciavicco, Villani, Oriolo. The bible for kinematics, dynamics, and control. Chapters 3-7 cover everything in this lesson's Ch 1, 4, 5, 6.
Papers:
| Paper | Why Read It |
|---|---|
| Todorov et al. "MuJoCo" (2012) | The physics engine paper — explains the convex contact formulation |
| Tobin et al. "Domain Randomization" (2017) | The original DR paper for sim-to-real visual transfer |
| Chi et al. "Diffusion Policy" (2023) | State-of-the-art imitation learning for multi-modal tasks |
| Black et al. "pi-0" (2024) | Flow matching for dexterous manipulation — the frontier |
| Wen et al. "FoundationPose" (2024) | Zero-shot 6-DOF pose estimation without CAD models |
Repos:
| Repo | What It Is |
|---|---|
| google-deepmind/mujoco | MuJoCo physics engine — the standard for robotics sim |
| RobotLocomotion/drake | Drake — mathematically rigorous robotics toolbox |
| NVIDIA-Omniverse/IsaacLab | Isaac Lab — GPU-accelerated robot learning |
| huggingface/lerobot | LeRobot — end-to-end robot learning framework |
| ARISE-Initiative/robosuite | robosuite — benchmarks for robot manipulation in MuJoCo |
Continue your preparation with these related Gleam lessons:
| Lesson | Connection |
|---|---|
| Classical SLAM | Localization and mapping for mobile robots — the perception backbone for navigation |
| Modern SLAM | Neural SLAM, NeRF-based mapping — frontier perception for robotics |
| Multiview Geometry | Camera models, stereo vision, 3D reconstruction — foundations of robot perception |
| ML Performance | GPU optimization, profiling, inference — making robot learning models run at real-time speed |
Practice whiteboard-style. Work through the problem before revealing the answer.