Staff-Engineer Interview Prep — Apple SPG Robotics

Robotics Simulation
& Software

You build the simulation platform AND the robot software that runs on it. From rigid body dynamics to sim-to-real transfer, every chapter covers concept, design, code, debugging, and the frontier.

Prerequisites: Basic linear algebra + Some Python. That's it.
13
Chapters
13+
Simulations
5
Dimensions / Ch

Chapter 0: The Role — What Apple SPG Robotics Engineers Build

You're a robotics simulation and software engineer at Apple's Special Projects Group. Not a pure researcher who publishes papers. Not a mechanical engineer who designs hardware. You own the full software stack from physics simulation to real-world deployment.

Here is what your day looks like. Morning: a contact model in MuJoCo produces unrealistic friction forces during a grasp sequence — you debug the solver parameters and validate against real force-torque sensor data. Afternoon: the motion planner for a new manipulation task takes 800ms per query — you need it under 50ms. You switch from RRT to a cached PRM with lazy collision checking. Evening: the sim-to-real transfer results come in. The policy that achieved 94% success in simulation drops to 35% on the real robot. You systematically identify that actuator delay is the dominant gap and add a 20ms latency randomization to the sim.

The full system you own:

Simulation Engine
Physics, contact, rendering — MuJoCo, Isaac Sim, Drake
Robot Middleware
ROS2 nodes, real-time transport, hardware abstraction
Perception
Depth sensing, object detection, pose estimation, point clouds
Planning
Motion planning (RRT, TrajOpt), task planning, grasp synthesis
Controls
PID, impedance, computed torque, operational space
Sim-to-Real & Learning
Domain randomization, RL, imitation, diffusion policies
Interview tip: Frame every answer as: "I built system X that enabled capability Y, tested via simulation Z, deployed on hardware W." Example: "I built a domain-randomized grasping pipeline in MuJoCo that enabled 6-DOF pick-and-place of novel objects, validated with 10,000 randomized episodes, and deployed on a Franka Panda achieving 87% first-attempt grasp success on unseen objects."

Real numbers for a manipulation system development cycle:

MetricWithout SimWith Full Sim Pipeline
Time to first working policy6 months (real-only)3 weeks (sim) + 2 days (transfer)
Cost per training run$50K (hardware wear + engineer time)$200 (GPU compute)
Iterations per day2-3 (limited by real robot)1000+ (parallelized sim)
Safety incidents during dev~12 (broken grippers, dropped objects)0 in sim, 1 during transfer
Sim-to-real success rateN/A60-90% (depends on domain gap)

The simulation platform is the 10x multiplier. Every hour you invest in sim fidelity saves ten hours on the real robot. But simulation is never perfect — and knowing exactly where it's wrong is the skill that separates senior engineers from juniors.

System Architecture: Click Components to Explore Data Flow

Click any box to see what data flows in and out. This is the full robotics software stack you own.

The five dimensions of every chapter: Each chapter covers (1) CONCEPT — whiteboard-ready from first principles, (2) DESIGN — where it fits in the system, data flow, tradeoffs, (3) CODE — from-scratch implementation, (4) DEBUG — failure modes and fixes, (5) FRONTIER — latest papers and repos. This mirrors the five angles a staff interview probes.
Interview question: You're building a robot that picks objects from a bin. Design the software stack from simulation to deployment. What are the key components and how do they connect?

Chapter 1: Rigid Body Dynamics — The Physics Engine Core

CONCEPT: Newton-Euler Equations

Every physics simulator solves the same fundamental problem: given forces acting on objects, compute how they move. The state of a rigid body is fully described by six quantities: position (x, y, z), orientation (how it's rotated), linear velocity (vx, vy, vz), and angular velocity (ωx, ωy, ωz). That's 13 numbers if you use quaternions for orientation (4 numbers instead of 3 Euler angles).

Why quaternions? Euler angles suffer from gimbal lock — a configuration where two rotation axes align and you lose a degree of freedom. Point your arm straight up and try to distinguish "turn left" from "roll" — they become the same motion. Quaternions avoid this entirely at the cost of one extra number and a normalization constraint (qw2 + qx2 + qy2 + qz2 = 1).

The equations of motion are Newton's second law applied to both translation and rotation:

F = m · a     (translation: force = mass × acceleration)
τ = I · α + ω × (I · ω)     (rotation: torque = inertia × angular accel + gyroscopic term)

The gyroscopic term ω × (I · ω) is what makes rotation non-intuitive. A spinning top stays upright because of this cross product. A tumbling satellite exhibits bizarre wobbling. This term is what separates "physics for games" (which often ignores it) from "physics for robotics" (which cannot).

DESIGN: The Simulation Loop

The simulation runs in a tight loop at a fixed timestep, typically 1kHz (dt = 0.001s) for robotics. Each step:

1. Sense
Read simulated sensors (joint encoders, force/torque, cameras)
2. Plan/Control
Robot software computes desired torques/velocities
3. Apply Forces
Accumulate gravity + control + contact + friction forces
4. Integrate
Update velocities and positions using Newton-Euler
5. Detect Collisions
Check for interpenetration, generate contact constraints
6. Render
Draw the scene (at 30-60Hz, not every physics step)
↻ repeat at 1kHz

Timestep selection is the first design decision. Too large (dt=0.01s) and the simulation becomes unstable — objects fly through each other or energy grows without bound. Too small (dt=0.0001s) and the simulation is 10x slower than needed. The sweet spot for rigid body robotics: 1-4ms (250Hz-1kHz). For soft body or deformable objects, you may need 0.1ms.

Integration method matters enormously. Explicit Euler (v += a*dt, x += v*dt) is simple but adds energy over time — your simulation slowly "explodes." Semi-implicit Euler (update velocity first, then use new velocity for position) is nearly as simple but much more stable. Runge-Kutta 4 (RK4) is accurate but expensive — 4 force evaluations per step. Most physics engines use semi-implicit Euler because the stability-per-cost ratio is unbeatable.

CODE: 2D Rigid Body Simulator from Scratch

python
import numpy as np

class RigidBody2D:
    def __init__(self, mass, inertia, x, y, angle=0):
        self.mass = mass
        self.inertia = inertia  # scalar for 2D
        self.pos = np.array([x, y], dtype='float64')
        self.angle = angle
        self.vel = np.zeros(2)
        self.omega = 0.0       # angular velocity (scalar in 2D)
        self.force = np.zeros(2)
        self.torque = 0.0

    def apply_force(self, f, point=None):
        self.force += f
        if point is not None:
            r = point - self.pos
            self.torque += r[0]*f[1] - r[1]*f[0]  # 2D cross product

    def step(self, dt, gravity=np.array([0, -9.81])):
        # Semi-implicit Euler: update velocity first, then position
        self.force += self.mass * gravity
        a = self.force / self.mass
        alpha = self.torque / self.inertia

        self.vel += a * dt          # velocity update
        self.omega += alpha * dt    # angular velocity update
        self.pos += self.vel * dt   # position uses NEW velocity
        self.angle += self.omega * dt

        # Reset accumulators
        self.force = np.zeros(2)
        self.torque = 0.0

# Ground collision with restitution
def ground_collision(body, ground_y=0, restitution=0.6):
    if body.pos[1] < ground_y:
        body.pos[1] = ground_y          # project out of ground
        body.vel[1] *= -restitution     # bounce with energy loss
        body.omega *= 0.95              # friction damps rotation

# Run simulation
box = RigidBody2D(mass=1.0, inertia=0.1, x=0, y=5.0, angle=0.3)
dt = 0.001  # 1kHz
for _ in range(5000):  # 5 seconds
    ground_collision(box)
    box.step(dt)
    # Energy = 0.5*m*v^2 + 0.5*I*w^2 + m*g*h
    KE = 0.5*box.mass*np.dot(box.vel, box.vel) + 0.5*box.inertia*box.omega**2
    PE = box.mass * 9.81 * box.pos[1]
    # With semi-implicit Euler, total energy decays (stable)
    # With explicit Euler, it would GROW (unstable)

DEBUG: Common Physics Simulation Failures

Debugging tip: "Simulation explodes" (positions go to infinity) → timestep too large for the stiffness of your system. Try halving dt. If that fixes it, you were using explicit Euler with stiff contacts. Switch to semi-implicit Euler or an implicit integrator. MuJoCo's Euler integrator is actually semi-implicit — that's why it's so stable.
Debugging tip: "Objects pass through each other" → tunneling. The object moves farther than its thickness in one timestep. Fix: enable CCD (continuous collision detection) which checks the swept volume, or reduce dt. Most engines offer a "max penetration" parameter to catch this.
Debugging tip: "Energy grows over time even with no applied forces" → explicit Euler integration adds energy at every step. The growth is proportional to dt. Switch to semi-implicit Euler (swap the order: update velocity, THEN update position with new velocity). This one-line change turns an energy-adding integrator into an energy-dissipating one.
Interview tip: When asked about numerical integration, don't just name methods. Explain the tradeoff: explicit Euler is O(dt) accurate but energy-unstable. Semi-implicit Euler is also O(dt) but symplectic (preserves phase-space volume, so energy oscillates around the true value instead of growing). RK4 is O(dt4) accurate but costs 4x and is NOT symplectic. For robotics sim, semi-implicit wins because stability matters more than accuracy at 1kHz.

Worked Example: Energy Conservation Across Integrators

Let's trace a single bouncing ball through three integration methods to see why the choice matters. Initial state: height = 5m, velocity = 0, mass = 1kg, g = 9.81 m/s2, dt = 0.01s, restitution = 1.0 (perfectly elastic).

Explicit Euler (update position with OLD velocity, then update velocity):

Step 0: v = 0, y = 5.0, E = 49.05 J
Step 1: y = 5.0 + 0*0.01 = 5.0, v = 0 + (-9.81)*0.01 = -0.0981 → E = 49.05 J
Step 100: y = 4.51, v = -0.981 → E = 49.70 J (+1.3%)
Step 500: y = 0.0, bounce, v reverses → E = 52.3 J (+6.6%)
After 10 bounces: E = 78 J (+59%) — UNSTABLE, energy grows

Semi-implicit Euler (update velocity FIRST, then position with NEW velocity):

Step 1: v = -0.0981, y = 5.0 + (-0.0981)*0.01 = 4.999 → E = 49.05 J
After 10 bounces: E = 48.5 J (-1.1%) — STABLE, energy slowly decays

RK4: E stays within 0.001% of initial after 10 bounces — accurate but costs 4x per step. For robotics at 1kHz, semi-implicit Euler gives the best stability-per-compute ratio.

FRONTIER: Differentiable Physics

Differentiable physics engines compute gradients through the simulation. Instead of just running forward simulation, you can backpropagate through the physics to optimize control parameters, robot designs, or sim parameters to match real data. Key projects:

EngineFrameworkKey Feature
BraxJAXMassively parallel on GPU/TPU, used for RL
DiffTaichiTaichiCustom DSL for differentiable simulation
WarpNVIDIA/PythonCUDA kernels with autodiff
MuJoCo MJXJAXMuJoCo's contact model, differentiable on GPU
DojoJuliaSmooth contact gradients via randomized smoothing

The practical impact: differentiable physics lets you do gradient-based system identification. Instead of grid-searching over friction values, compute d(sim_trajectory)/d(friction) and use gradient descent to match real trajectories. This converts a combinatorial search (exponential in number of parameters) into a smooth optimization (linear in iterations).

The frontier is learning simulation parameters from real data. Record a real robot dropping an object, then backpropagate through the differentiable sim to find the friction and restitution coefficients that make the sim match reality. This is system identification via gradient descent — dramatically faster than manual tuning.

Rigid Body Simulator: Bouncing Box

A box falls under gravity and bounces. Adjust gravity, restitution (bounciness), and timestep. Watch the energy bar — semi-implicit Euler dissipates energy (stable), explicit Euler would add it (unstable).

Gravity9.8
Restitution0.70
Timestep (ms)2
Interview question: Your simulation of a robot arm shows joints drifting slowly over time even with no applied torque. What's the most likely cause?

Chapter 2: Contact & Collision — Where Simulation Gets Hard

CONCEPT: Collision Detection Pipeline

Contact is the hardest problem in physics simulation. Two objects touching creates a discontinuity — velocity can change instantaneously. The simulation must detect when objects collide, compute the contact forces, and resolve the interpenetration, all within one timestep.

The collision detection pipeline has two phases. Broad phase quickly eliminates pairs that definitely don't collide using cheap bounding volume tests. Narrow phase computes exact contact points for the remaining candidates.

Broad phase uses AABB (axis-aligned bounding boxes). Each object gets a box aligned to the world axes. Two AABBs overlap only if they overlap on ALL three axes — a trivial comparison of six numbers. For N objects, a naive check is O(N2), but a sweep-and-prune algorithm (sort AABBs along each axis) reduces this to O(N log N). A bin of 100 objects has 4,950 potential pairs, but broad phase culls this to ~200.

Narrow phase uses GJK (Gilbert-Johnson-Keerthi) for convex shapes. GJK tests whether the Minkowski difference of two shapes contains the origin. If it does, they intersect. EPA (Expanding Polytope Algorithm) then finds the penetration depth and contact normal. For non-convex shapes (meshes), the engine decomposes them into convex hulls first.

DESIGN: Contact Resolution Strategies

Once we detect a contact, we need to resolve it. There are two fundamentally different approaches:

Penalty methods are the simplest. When two objects overlap by depth d along normal n, apply a spring force F = k · d · n. This pushes them apart. The problem: to prevent visible penetration, you need a very stiff spring (large k), which requires a very small timestep to remain stable. It's a tradeoff between visual quality and simulation speed.

Fpenalty = k · d · n + b · vrel     (spring + damper)
where d = penetration depth, n = contact normal, vrel = relative velocity

Constraint-based methods (used by MuJoCo, Drake, Bullet) formulate contacts as velocity-level constraints. Instead of "apply a force proportional to penetration," they say "find the minimum force that prevents further penetration." This is a Linear Complementarity Problem (LCP): either the contact force is zero (objects separating) or the gap is zero (objects touching). Never both positive. MuJoCo goes further with a convex optimization formulation that always has a unique solution.

Friction follows the Coulomb model: tangential friction force |ft| ≤ μ · fn, where μ is the friction coefficient and fn is the normal force. The friction force lives inside a cone. Below the cone limit, the contact is sticking (static friction). At the limit, it's sliding (kinetic friction). Simulating this accurately is critical for grasping — without proper friction, a robot can't hold anything.

CODE: AABB Collision + Penalty Response

python
import numpy as np

def aabb_overlap(a_min, a_max, b_min, b_max):
    """Check if two AABBs overlap. O(1) per pair."""
    return (a_min[0] <= b_max[0] and a_max[0] >= b_min[0] and
            a_min[1] <= b_max[1] and a_max[1] >= b_min[1])

def circle_collision(p1, r1, p2, r2):
    """Narrow phase for circles: penetration depth + normal."""
    diff = p2 - p1
    dist = np.linalg.norm(diff)
    if dist < r1 + r2 and dist > 1e-8:
        normal = diff / dist
        depth = (r1 + r2) - dist
        return depth, normal
    return 0, np.zeros(2)

def penalty_force(depth, normal, v_rel, k=5000, b=50):
    """Spring-damper penalty force."""
    if depth <= 0: return np.zeros(2)
    f_spring = k * depth * normal          # pushes objects apart
    f_damp = -b * np.dot(v_rel, normal) * normal  # damps oscillation
    return f_spring + f_damp

# Why penalty is bad for robotics:
# k=5000 requires dt < 2*sqrt(m/k) ~ 0.03s for stability
# k=50000 (for stiff contact) requires dt < 0.009s
# k=500000 (realistic steel) requires dt < 0.003s
# Real engines use constraint-based methods instead

DEBUG: Contact Simulation Failures

Debugging tip: "Objects stick together after collision" → damping coefficient too high in the penalty method, or the constraint solver isn't converging (not enough iterations). In MuJoCo, increase nconmax (max contacts) and njmax (max constraints). Check that solver iterations is at least 50-100 for complex contact scenes.
Debugging tip: "Grasped object slips out of the hand" → friction coefficient too low. Real rubber-on-plastic friction is μ ≈ 0.8-1.2. Simulation often defaults to μ = 0.5. Also check that the friction cone is not being approximated with too few facets (pyramid vs true cone). In MuJoCo, condim=4 gives full friction cone; condim=1 is frictionless.
Debugging tip: "Sim is slow with many objects" → too many contacts. Profile with MuJoCo's built-in timer. Simplify collision geometry: use convex hulls instead of triangle meshes. A coffee mug mesh has 5,000 triangles; its convex hull has ~20 faces. For a bin of parts, use convex decomposition (VHACD algorithm) to get 3-8 convex pieces per object.
Interview tip: When asked about contact in simulation, distinguish between penalty and constraint-based methods. Penalty: simple, differentiable, but requires small timestep for stiff contact. Constraint-based: physically accurate, allows large timesteps, but harder to implement and not natively differentiable. MuJoCo uses a unique convex optimization approach that's fast, accurate, and now (via MJX) differentiable.

FRONTIER: Differentiable Contact and Learned Models

Differentiable contact is the holy grail. Contact creates discontinuities (velocity jumps), which break gradients. Approaches: (1) Randomized smoothing (Dojo) — average over random perturbations to smooth out the discontinuity. (2) Soft contact models (Drake's hydroelastic) — replace hard contacts with smooth force fields. (3) Learned contact models — train a neural network to predict contact forces from geometry and velocities. The frontier is combining physics-based priors with learned residuals.

Contact Resolution: Drag Shapes Into Each Other

Drag the orange circle. When it overlaps the teal circle, see penetration depth, contact normal, and penalty force. Adjust stiffness and friction.

Stiffness (k)3000
Friction (μ)0.70
Interview question: You're simulating a robot hand grasping a wine glass. The glass keeps slipping. What contact model parameters do you investigate?
🔨 Derivation Derive the Complementarity Constraint for Contact ✓ ATTEMPTED

Contact dynamics are formulated as a Linear Complementarity Problem (LCP). The key constraint is: fn ≥ 0, φ ≥ 0, fn · φ = 0, where fn is the normal contact force and φ is the gap distance between objects.

Your task: (1) Explain physically why these three conditions must hold simultaneously (why can't fn > 0 and φ > 0 at the same time?). (2) Derive the Signorini condition from energy principles. (3) Show why this makes the contact problem non-smooth (and thus harder than standard ODE integration).

fn ≥ 0: the ground can only push, never pull (no glue). φ ≥ 0: objects can't interpenetrate. fn · φ = 0: force is nonzero ONLY when gap is zero (touching). If separated (φ > 0), no force acts. If touching (φ = 0), force can be positive. These three together define a "complementarity condition" — exactly one of {fn, φ} is zero at any time.
Consider the potential energy of the contact: U = fn · φ. At equilibrium, the system minimizes energy subject to physical constraints. The constraint fn ≥ 0, φ ≥ 0 defines a feasible region. The KKT conditions for minimizing U = fn·φ subject to both ≥ 0 give exactly the complementarity condition. This is why contact is a constrained optimization, not just an ODE.
The function fn(φ) has a kink at φ=0: for φ > 0, fn = 0 (free space). For φ = 0, fn can be any positive value (active contact). This is not differentiable at the transition. Standard ODE integrators (Runge-Kutta) assume smooth right-hand sides. Contact creates a DAE (differential-algebraic equation) with inequality constraints — fundamentally harder. This is why physics engines use specialized solvers (projected Gauss-Seidel, ADMM) instead of standard integrators for the contact phase.

Signorini conditions from first principles:

1. No interpenetration: φ(q) ≥ 0 (geometric constraint on configuration q)

2. No adhesion: fn ≥ 0 (ground reaction is compressive only)

3. Complementarity: fn · φ(q) = 0 (force only when in contact)

From energy: At time t, the system has velocity v and we seek the force fn that satisfies Newton's law M·a = fext + fn·n while maintaining φ ≥ 0. The velocity-level constraint is: φ̇ = J·v ≥ 0 (gap can't decrease below zero). This gives the LCP: find fn ≥ 0 such that φ̇ = J·M-1(fext + JTfn)·dt + J·v ≥ 0 and fn ⊥ φ̇.

Non-smoothness: The solution map fn(φ) is a step function: fn = max(0, k·(-φ)) for penalty, or fn ∈ NR+(φ) (normal cone) for constraint-based. Either way, it's not differentiable at φ=0. This means standard backpropagation through contact produces zero or undefined gradients. Differentiable simulators handle this via randomized smoothing, soft contacts, or subgradient methods.

The key insight: Contact is not just "hard" — it's mathematically discontinuous. The force jumps from 0 to a large value at the instant of contact. This is why explicit Euler + penalty methods need tiny timesteps (the force is a stiff spring), and why implicit/constraint-based methods can take larger steps (they solve for the consistent force directly).

🔨 Derivation Why Timestep Matters: Stability vs Accuracy Tradeoff ✓ ATTEMPTED

Consider a simple spring-mass system: m·a = -k·x. With explicit Euler integration (xn+1 = xn + dt·vn, vn+1 = vn + dt·(-k/m)·xn):

Your task: (1) Show that this system is stable only when dt < 2√(m/k). (2) For a robot contact with k = 100,000 N/m and m = 1 kg, what's the max stable timestep? (3) Derive why semi-implicit Euler (symplectic) is unconditionally stable for this system.

[xn+1, vn+1]T = A · [xn, vn]T where A = [[1, dt], [-dt·k/m, 1]]. Stability requires all eigenvalues of A have magnitude ≤ 1. Compute det(A - λI) = 0 to find eigenvalues.
λ = 1 - dt2k/(2m) ± i·dt√(k/m - dt2k2/(4m2)). The magnitude |λ|2 = 1 + dt2(dt2k/m - 4)k/(4m). For |λ| ≤ 1: dt2k/m ≤ 4, so dt ≤ 2/ω where ω = √(k/m).
Semi-implicit: vn+1 = vn + dt·(-k/m)·xn, then xn+1 = xn + dt·vn+1 (use NEW velocity for position). The matrix becomes [[1-dt2k/m, dt], [-dt·k/m, 1]]. Check: det(A) = 1 exactly! This means the map is area-preserving (symplectic), so eigenvalues lie on the unit circle — energy neither grows nor decays.

Part 1: Explicit Euler matrix: det(A) = 1 + dt2k/m. Since det > 1, the system is volume-expanding in phase space — energy grows each step. For stability, need |λ| ≤ 1: dt2 ≤ 4m/k, so dt ≤ 2√(m/k). ■

Part 2: With k=100,000 N/m, m=1 kg: dt ≤ 2√(1/100000) = 2/316 = 0.0063s = 6.3ms. At 1kHz control (dt=1ms), you're stable. At 100Hz (dt=10ms), the simulation explodes. This is why contact simulation typically runs at 1-10kHz.

Part 3: Semi-implicit: vn+1 = vn - dt(k/m)xn, xn+1 = xn + dt·vn+1. Matrix A has det(A) = 1 always (regardless of dt). The eigenvalues have |λ| = 1, so the system is marginally stable. Energy oscillates slightly but never grows unboundedly. This is why MuJoCo uses semi-implicit Euler as its default integrator — it's stable for arbitrarily stiff contacts with reasonable timesteps.

The key insight: The stability limit scales as √(m/k). Stiff contacts (large k) require tiny timesteps with explicit methods. Semi-implicit methods remove this restriction by preserving the symplectic structure. This is THE reason professional physics engines use symplectic integrators, not Runge-Kutta.

Chapter 3: Simulation Platforms — MuJoCo, Isaac Sim, Drake, PyBullet

CONCEPT: What Each Simulator Does Well

Choosing the right simulator is a high-leverage decision. Each has a philosophy that shapes what it's good at:

MuJoCo (Multi-Joint dynamics with Contact) was created by Emo Todorov at the University of Washington. Its core insight: formulate contact as a convex optimization problem, which always has a unique solution and is fast to solve. MuJoCo runs at 10-50 million timesteps per second on a single CPU thread. It's the default for reinforcement learning research because you need millions of episodes, and speed is everything. Since DeepMind acquired it (now open-source), it has become the standard.

Isaac Sim (NVIDIA) is built on PhysX 5 and Omniverse. Its killer feature: GPU-accelerated parallel simulation. Instead of running one environment fast, run 4,096 environments simultaneously on a single GPU. Combined with photorealistic rendering via ray tracing, it's unbeatable for (1) massive RL training and (2) visual sim-to-real transfer where you need realistic images.

Drake (MIT Robot Locomotion Group) emphasizes mathematical rigor. Every algorithm has correctness guarantees. Its contact model (hydroelastic) produces smooth, physically meaningful contact patches. Drake includes optimization-based planners (SNOPT, IPOPT) and controllers (LQR, MPC) out of the box. Choose Drake when you need provably correct behavior — like designing a controller for a legged robot that must never fall.

PyBullet wraps the Bullet physics engine in Python. Simple API, good documentation, fast to prototype. But its contact model is less accurate than MuJoCo's, and it lacks GPU acceleration. Use PyBullet for quick experiments and educational prototypes, then graduate to MuJoCo or Isaac for serious work.

DESIGN: Choosing a Simulator

Use CaseBest ChoiceWhy
RL training (speed)MuJoCo / Isaac LabNeed millions of episodes fast
RL training (visual)Isaac SimNeed photorealistic rendering + parallel envs
Manipulation researchDrake / MuJoCoNeed accurate contact for grasping
Visual perceptionIsaac SimNeed realistic cameras, lighting, materials
Provably safe controlDrakeHas formal verification tools
Quick prototypePyBullet / MuJoCoSimplest API, fastest to start
Deformable / soft objectsIsaac Sim (FEM) / SOFAGPU-accelerated finite elements

CODE: MuJoCo Hello World

python
import mujoco
import numpy as np

# Define a simple robot arm in MJCF (MuJoCo's XML format)
XML = """
<mujoco>
  <option timestep="0.002" gravity="0 0 -9.81"/>
  <worldbody>
    <light pos="0 0 3"/>
    <geom type="plane" size="2 2 0.1"/>
    <body name="link1" pos="0 0 0.5">
      <joint name="j1" type="hinge" axis="0 1 0"/>
      <geom type="capsule" size="0.04" fromto="0 0 0 0 0 0.4" rgba="1 .5 .3 1"/>
      <body name="link2" pos="0 0 0.4">
        <joint name="j2" type="hinge" axis="0 1 0"/>
        <geom type="capsule" size="0.03" fromto="0 0 0 0 0 0.3" rgba=".3 .8 .7 1"/>
      </body>
    </body>
  </worldbody>
  <actuator>
    <motor joint="j1" ctrlrange="-2 2"/>
    <motor joint="j2" ctrlrange="-2 2"/>
  </actuator>
</mujoco>
"""

model = mujoco.MjModel.from_xml_string(XML)
data = mujoco.MjData(model)

# Simulate 1000 steps
for i in range(1000):
    data.ctrl[:] = [0.5, -0.3]  # apply torques to joints
    mujoco.mj_step(model, data)
    if i % 100 == 0:
        print(f"t={data.time:.2f}s  j1={data.qpos[0]:.3f}rad  j2={data.qpos[1]:.3f}rad")

DEBUG: Simulator-Specific Issues

Debugging tip: "MuJoCo model doesn't match real robot" → check (1) inertia parameters (use CAD model or measure experimentally), (2) joint damping (real joints have friction), (3) actuator model (position servo vs direct torque — MuJoCo defaults to direct torque, but real robots often have PD controllers in the driver). Add <joint damping="0.1"/> and actuator/position for more realistic behavior.
Debugging tip: "Isaac Sim is slow" → reduce the number of parallel environments until GPU memory pressure drops. Lower rendering resolution (256x256 is fine for RL). Use headless mode (--headless) to skip rendering entirely when training policies that only need proprioception. Switch from RTX rendering to simple rasterization if photorealism isn't needed.
Interview tip: When asked "which simulator would you use?", never give a single answer. Say: "It depends on the use case. For this specific problem of [X], I'd choose [Y] because [Z]. But if the requirements changed to include [A], I'd switch to [B]." This shows systems thinking, not tool loyalty.

FRONTIER: Next-Generation Simulators

ProjectWhat's NewStatus (2026)
MuJoCo 3.xComposable scenes, better deformables, Python-first APIStable, open-source
Isaac LabSuccessor to Isaac Gym, modular task frameworkProduction, NVIDIA-supported
GenesisGenerative sim — LLM-generated environmentsResearch preview
MuJoCo MJXMuJoCo ported to JAX for GPU + autodiffStable, actively developed
SAPIENArticulated object manipulation, realistic renderingOpen-source
Simulator Comparison Dashboard

Click each simulator card to see its strengths. Radar chart shows ratings across 5 dimensions.

Interview question: You need to train a reinforcement learning policy for dexterous manipulation and then transfer it to a real robot hand. Which simulator do you choose and why?

Chapter 4: Robot Kinematics & Dynamics

CONCEPT: Forward and Inverse Kinematics

A robot arm is a chain of links connected by joints. Each joint has an angle θ. Given all the joint angles, where is the end-effector (the tool at the tip)? That's forward kinematics (FK) — a straightforward chain of matrix multiplications. Given a desired end-effector position, what joint angles achieve it? That's inverse kinematics (IK) — a much harder problem because it's typically underdetermined (multiple solutions) or overconstrained (no solution).

For a 2-link planar arm (the simplest case), FK uses trigonometry:

x = L1 cos(θ1) + L2 cos(θ1 + θ2)
y = L1 sin(θ1) + L2 sin(θ1 + θ2)

The Jacobian J is the matrix that maps joint velocities to end-effector velocities: v = J · θ̇. It's computed by taking partial derivatives of the FK equations with respect to each joint angle. The Jacobian is the central object in robotics — it tells you how the end-effector moves when you wiggle each joint.

J = ∂(x, y) / ∂(θ1, θ2) = [[-L1s1 - L2s12, -L2s12], [L1c1 + L2c12, L2c12]]

where s1 = sin(θ1), s12 = sin(θ12), etc.

IK via the Jacobian pseudoinverse: Δθ = J+ · Δx. This iteratively adjusts joint angles to move the end-effector toward the target. When the Jacobian becomes singular (loses rank), IK fails. This happens at singularities — configurations where the arm is fully extended or folded. The fix is damped least squares: Δθ = JT(JJT + λ2I)-1 · Δx. The damping λ prevents the solution from blowing up near singularities.

DESIGN: The Control Hierarchy

Robot control operates in three coordinate spaces, forming a hierarchy:

Task Space (Cartesian)
"Move end-effector to (0.5, 0.3, 0.2)" — what the planner outputs
↓ IK
Joint Space (Angles)
"Set joints to (0.4, -0.7, 1.2, ...)" — what the controller tracks
↓ Dynamics
Actuator Space (Torques)
"Apply 2.3 Nm to motor 1" — what the hardware executes

A 7-DOF arm (like the Franka Panda) solving a 6-DOF task (3D position + 3D orientation) has one extra degree of freedom. This redundancy creates a null space — a set of joint motions that don't change the end-effector pose. You can use this null space to optimize a secondary objective: avoid joint limits, keep the elbow away from obstacles, or minimize energy. The null space projector is N = I - J+J.

CODE: Forward Kinematics and IK for a 2-Link Arm

python
import numpy as np

def forward_kinematics(theta1, theta2, L1=1.0, L2=0.8):
    """FK: joint angles -> end-effector position."""
    x = L1 * np.cos(theta1) + L2 * np.cos(theta1 + theta2)
    y = L1 * np.sin(theta1) + L2 * np.sin(theta1 + theta2)
    return np.array([x, y])

def jacobian(theta1, theta2, L1=1.0, L2=0.8):
    """Jacobian: d(x,y)/d(theta1, theta2)."""
    s1, c1 = np.sin(theta1), np.cos(theta1)
    s12, c12 = np.sin(theta1+theta2), np.cos(theta1+theta2)
    return np.array([
        [-L1*s1 - L2*s12, -L2*s12],
        [ L1*c1 + L2*c12,  L2*c12]
    ])

def ik_damped(target, theta_init, L1=1.0, L2=0.8, lam=0.1, steps=100):
    """Damped least-squares IK. Converges even near singularities."""
    theta = np.array(theta_init, dtype='float64')
    for _ in range(steps):
        pos = forward_kinematics(theta[0], theta[1], L1, L2)
        error = target - pos
        if np.linalg.norm(error) < 1e-4: break
        J = jacobian(theta[0], theta[1], L1, L2)
        # Damped pseudoinverse: J^T (J J^T + lambda^2 I)^-1
        JJT = J @ J.T + lam**2 * np.eye(2)
        dtheta = J.T @ np.linalg.solve(JJT, error)
        theta += dtheta * 0.5  # step size for stability
    return theta

# Example: reach target at (1.2, 0.5)
target = np.array([1.2, 0.5])
angles = ik_damped(target, [0.5, 0.5])
result = forward_kinematics(angles[0], angles[1])
print(f"Target: {target}, Reached: {result}, Error: {np.linalg.norm(target-result):.6f}")

DEBUG: Kinematics Failures

Debugging tip: "IK doesn't converge" → check two things. (1) Is the target reachable? For a 2-link arm, the reachable workspace is an annulus: |L1 - L2| ≤ r ≤ L1 + L2. (2) Is the arm near a singularity? Compute the Jacobian condition number: cond(J) = σmaxmin. If cond(J) > 100, you're near singular — increase the damping λ in damped least squares.
Debugging tip: "7-DOF arm reaches target but elbow is in an awkward position" → use null-space optimization. The null-space projector N = I - J+J lets you add a secondary task (like "keep elbow high") without affecting the end-effector: Δθ = J+Δx + N · Δθsecondary.
Interview tip: When asked about IK, don't just explain the algorithm. Explain the failure modes: singularities (arm fully extended), unreachable targets, and multiple solutions (elbow-up vs elbow-down). For 7-DOF arms, explain the null space and how you'd use it for collision avoidance or joint limit avoidance. This shows you've deployed IK on real robots, not just studied it.

FRONTIER: Learned Kinematics

IKFlow uses normalizing flows to learn the full IK solution manifold — given a target, it samples from ALL valid joint configurations, not just one. This is useful for motion planning where you need diverse IK solutions to find collision-free paths. Differentiable FK enables end-to-end learning: backpropagate through the kinematic chain to optimize grasp poses jointly with control parameters.

Interactive 2-Link Arm: Click to Set Target

Click anywhere to set a target (yellow dot). The arm uses damped least-squares IK to reach it. Red glow near full extension = singularity. Watch the Jacobian condition number.

Damping (λ)0.10
Interview question: Your 7-DOF robot arm reaches the target but the elbow is in an awkward configuration near an obstacle. How do you use the null space to fix this?

Chapter 5: Motion Planning — Getting from A to B

CONCEPT: Configuration Space and Sampling-Based Planners

The robot arm lives in configuration space (C-space): the space of all possible joint angles. A 6-DOF arm has a 6-dimensional C-space. An obstacle in the 3D workspace becomes a complex, high-dimensional forbidden region in C-space. Motion planning finds a path through C-space that avoids these forbidden regions.

Why not just plan in workspace (3D)? Because the arm has width. Two workspace paths that look identical may differ dramatically in C-space — one requires the elbow to pass through a shelf, the other doesn't. C-space captures the full geometry of the arm, not just its tip.

RRT (Rapidly-exploring Random Trees) is the workhorse of robotics planning. It grows a tree from the start configuration by repeatedly: (1) sample a random configuration, (2) find the nearest node in the tree, (3) extend toward the sample by a step size, (4) check the extension for collisions, (5) if collision-free, add the new node. RRT is probabilistically complete — given enough samples, it will find a path if one exists.

RRT-Connect (bidirectional RRT) grows trees from BOTH the start and goal. When the trees meet, you have a path. This is dramatically faster for narrow-passage problems because both trees are exploring simultaneously.

Optimization-based planners (CHOMP, TrajOpt, STOMP) start with an initial trajectory (e.g., straight line in C-space) and iteratively deform it to avoid obstacles while minimizing smoothness/energy cost. These produce smoother paths than RRT but can get stuck in local minima.

DESIGN: The Planning Hierarchy

Task Planning
"Pick up the cup, move it to the shelf, place it" — WHAT to do
Motion Planning
"Find collision-free path from current config to grasp pose" — HOW to move
Trajectory Optimization
"Smooth the path, minimize jerk, respect velocity limits"
Control
"Track the trajectory with PID/impedance controller at 1kHz"

Typical pipeline: plan a coarse path with RRT (~100 waypoints, 200ms), smooth it with trajectory optimization (~20 waypoints, 50ms), then track it with a PD controller at 1kHz. The planner doesn't need to produce a perfect trajectory — just a collision-free waypoint sequence. The optimizer handles smoothness.

CODE: RRT from Scratch

python
import numpy as np

class RRT:
    def __init__(self, start, goal, bounds, obstacles, step=0.3):
        self.start = np.array(start)
        self.goal = np.array(goal)
        self.bounds = bounds       # [(lo, hi), (lo, hi)]
        self.obstacles = obstacles # list of (center, radius)
        self.step = step
        self.nodes = [self.start.copy()]
        self.parents = [-1]

    def sample(self):
        if np.random.rand() < 0.1:  # 10% goal bias
            return self.goal.copy()
        return np.array([np.random.uniform(lo, hi) for lo, hi in self.bounds])

    def nearest(self, point):
        dists = [np.linalg.norm(n - point) for n in self.nodes]
        return np.argmin(dists)

    def steer(self, from_pt, to_pt):
        diff = to_pt - from_pt
        dist = np.linalg.norm(diff)
        if dist < self.step: return to_pt.copy()
        return from_pt + diff / dist * self.step

    def collision_free(self, p1, p2, n_checks=10):
        for t in np.linspace(0, 1, n_checks):
            pt = p1 + t * (p2 - p1)
            for center, radius in self.obstacles:
                if np.linalg.norm(pt - center) < radius:
                    return False
        return True

    def plan(self, max_iter=2000):
        for _ in range(max_iter):
            rand = self.sample()
            near_idx = self.nearest(rand)
            new = self.steer(self.nodes[near_idx], rand)
            if self.collision_free(self.nodes[near_idx], new):
                self.nodes.append(new)
                self.parents.append(near_idx)
                if np.linalg.norm(new - self.goal) < self.step:
                    return self._trace(len(self.nodes)-1)
        return None  # failed

    def _trace(self, idx):
        path = []
        while idx != -1:
            path.append(self.nodes[idx])
            idx = self.parents[idx]
        return path[::-1]

DEBUG: Planning Failures

Debugging tip: "Planner can't find a path" → narrow passages in C-space. Solutions: (1) Use RRT-Connect (bidirectional). (2) Increase goal bias from 5% to 20%. (3) Use bridge-test sampling (sample near obstacle boundaries). (4) Reduce C-space dimensionality: plan the end-effector path in 3D, then use IK at each waypoint.
Debugging tip: "Path is jerky" → RRT produces piecewise-linear paths with sharp corners. Post-process: (1) shortcutting (try connecting non-adjacent waypoints, keep if collision-free), (2) B-spline smoothing, (3) trajectory optimization (TrajOpt minimizes jerk while respecting collision constraints).
Interview tip: When asked "how do you speed up planning?", give a menu: (1) better sampling (informed RRT*, goal bias, bridge test), (2) lazy collision checking (check collisions only on the final path, not every extension), (3) plan in lower dimensions (task space instead of joint space), (4) cache a PRM (build a roadmap offline, query online in <1ms). For real-time replanning, use dynamic RRT or trajectory optimization warm-started from the previous solution.

Worked Example: RRT vs RRT-Connect Performance

Consider a 2D workspace with a narrow passage (10% of workspace width). How many samples does each algorithm need?

AlgorithmPassage WidthAvg Samples to Find PathAvg Path Length
RRTWide (50%)~801.4x optimal
RRTNarrow (10%)~2,5001.8x optimal
RRTVery narrow (2%)~60,0002.5x optimal
RRT-ConnectWide (50%)~301.3x optimal
RRT-ConnectNarrow (10%)~4001.5x optimal
RRT-ConnectVery narrow (2%)~8,0002.0x optimal

RRT-Connect is 3-8x faster in narrow passages because both trees grow toward the gap from opposite sides, effectively halving the passage length each tree must discover. This is why bidirectional search is the default in production systems (OMPL defaults to RRT-Connect).

After finding a path, shortcutting tries connecting non-adjacent waypoints. For each pair (i, j) where j > i+1, check if the direct connection is collision-free. If so, remove all intermediate waypoints. This typically reduces path length by 30-50% and removes the characteristic "zigzag" of RRT paths.

FRONTIER: Neural and Diffusion-Based Planning

Diffusion Policy treats planning as denoising: start with a random trajectory, iteratively denoise it into a collision-free, task-completing path. It naturally handles multi-modality (multiple valid plans). MPNet learns a sampling distribution from successful plans, making RRT find paths faster in familiar environments. STORM uses parallel trajectory optimization on GPU to evaluate thousands of candidate trajectories simultaneously.

Interactive RRT Planner

Click to place start (green), goal (yellow), and obstacles (red). Press "Plan" to watch RRT grow. Toggle RRT vs RRT-Connect.

Step size0.40
Interview question: Your robot needs to plan a path through a cluttered shelf to grasp an object at the back. RRT takes 5 seconds. How do you reduce planning time to under 100ms?

Chapter 6: Control — Making the Robot Move Precisely

CONCEPT: PID and Beyond

PID control is the foundation. The controller computes a torque based on three error terms: Proportional (how far from target), Integral (accumulated past error), Derivative (how fast the error is changing):

τ = Kp · e + Ki · ∫e dt + Kd · de/dt
where e = θdesired - θactual

P alone gives fast response but overshoots. D damps the oscillation. I eliminates steady-state error (e.g., gravity pulling the arm down). The art is tuning: too much P = oscillation, too much D = sluggish, too much I = windup and overshoot on step changes.

Impedance control makes the robot behave like a mass-spring-damper system in task space. Instead of commanding a position, you command a dynamic relationship between force and displacement:

F = Md · ẍ + Bd · ẋ + Kd · (x - x0)

where Md, Bd, Kd are the desired inertia, damping, and stiffness. Low stiffness = compliant (good for contact tasks like wiping a table). High stiffness = stiff (good for precise positioning). This is critical for safe human-robot interaction — a stiff controller will hurt people, a compliant one will give way.

Computed torque (inverse dynamics) uses the robot's dynamic model to cancel out nonlinear effects (gravity, Coriolis, centrifugal forces), then applies a simple linear controller on top:

τ = M(q) · (q̈d + Kd(q̇d - q̇) + Kp(qd - q)) + C(q, q̇) + g(q)

This gives perfect tracking IF the model (M, C, g) is perfect. In practice, the model is always wrong — which is why robust and adaptive control exist.

DESIGN: Control Hierarchy for Contact Tasks

RegimeControllerRateUse Case
Free spacePosition control (PD + gravity comp)1kHzMoving between waypoints
Pre-contactImpedance (low stiffness)1kHzApproaching a surface safely
ContactForce/impedance control1kHzPushing, wiping, insertion
HybridPosition in some axes, force in others1kHzSliding along a surface (position normal, force tangent)

CODE: PID Controller with Anti-Windup

python
class PIDController:
    def __init__(self, kp, ki, kd, max_integral=10.0, max_output=50.0):
        self.kp, self.ki, self.kd = kp, ki, kd
        self.max_integral = max_integral  # anti-windup clamp
        self.max_output = max_output
        self.integral = 0.0
        self.prev_error = 0.0

    def compute(self, error, dt):
        # Proportional
        P = self.kp * error

        # Integral with anti-windup
        self.integral += error * dt
        self.integral = np.clip(self.integral, -self.max_integral, self.max_integral)
        I = self.ki * self.integral

        # Derivative (on error, not on measurement to avoid setpoint kick)
        D = self.kd * (error - self.prev_error) / dt if dt > 0 else 0
        self.prev_error = error

        output = P + I + D
        return np.clip(output, -self.max_output, self.max_output)

# Simulate a 1-DOF joint with PID
pid = PIDController(kp=20, ki=5, kd=8)
theta, omega = 0.0, 0.0  # current angle, angular velocity
target = 1.0  # desired angle (radians)
inertia = 0.5  # kg*m^2
dt = 0.001     # 1kHz

for i in range(3000):  # 3 seconds
    error = target - theta
    torque = pid.compute(error, dt)
    alpha = torque / inertia - 0.5 * omega  # + viscous friction
    omega += alpha * dt
    theta += omega * dt

DEBUG: Control Failures

Debugging tip: "Robot overshoots target" → P gain too high or D gain too low. Start with D = 0.1 * P as a rule of thumb. If the system is underdamped (oscillates), increase D. If critically damped (no oscillation, fastest settling), you're optimal. Overdamped (sluggish) means reduce D or increase P.
Debugging tip: "Robot vibrates at contact" → control is too stiff for the contact situation. Switch from position control to impedance control with reduced stiffness (Kd = 100 N/m instead of 2000 N/m). Add damping (Bd = 2√(KdMd) for critical damping). The vibration is the controller fighting the contact constraint.
Interview tip: When asked about control, explain the hierarchy: PID is the inner loop (1kHz), impedance/force control is the middle loop (1kHz), and the planner is the outer loop (10-100Hz). Show you understand when to use position vs force control: "In free space I use stiff position control for accuracy. During approach I reduce stiffness for safety. At contact I switch to force control to regulate interaction forces."

Worked Example: Tuning PID for a Robot Joint

A common tuning procedure (Ziegler-Nichols): (1) Set Ki = 0, Kd = 0. (2) Increase Kp until the system oscillates at a constant amplitude — this is the ultimate gain Ku. (3) Measure the oscillation period Tu. (4) Set gains according to the table:

ControllerKpKiKd
P only0.5 Ku00
PI0.45 Ku0.54 Ku/Tu0
PID0.6 Ku1.2 Ku/Tu0.075 Ku Tu

For our simulated joint (inertia = 0.5 kg m2, friction = 0.5): Ku ≈ 45, Tu ≈ 0.45s. Ziegler-Nichols PID: Kp = 27, Ki = 120, Kd = 1.5. This gives aggressive response with ~15% overshoot. For robotics, we usually want less overshoot (safer), so use 70% of Kp and 150% of Kd: Kp = 19, Ki = 120, Kd = 2.3.

In practice, Ziegler-Nichols gives a starting point. Fine-tuning is done by watching the step response: reduce overshoot by increasing D, reduce settling time by increasing P, eliminate steady-state error by adding I. The interactive widget below lets you feel this directly.

FRONTIER: Learned Controllers

Residual policy learning combines a classical controller with a learned correction: τ = τPID + πθ(state). The classical controller handles the known dynamics, and the neural network learns to correct for model mismatch and unmodeled effects. This converges much faster than learning from scratch because the residual is small. Adaptive control estimating parameters online is being replaced by learned adaptation — a meta-learned policy that adapts to new dynamics in a few timesteps.

PID Tuning Simulator

A 1-DOF joint tracking a step input. Adjust P, I, D gains and watch the step response. The target line is dashed. Observe overshoot, settling time, and steady-state error.

Kp20
Ki2.0
Kd8.0
Interview question: Your robot arm uses computed torque control but consistently undershoots during fast motions. The error is proportional to acceleration. What's wrong?

Chapter 7: Perception for Robotics — Seeing the World

CONCEPT: Depth Sensing and Object Understanding

A robot needs to know what's in front of it and where things are in 3D. Depth cameras provide this, but each type has different tradeoffs:

SensorHow It WorksRangeWeakness
StereoTwo cameras, triangulation0.5-10mFails on textureless surfaces
Structured lightProject pattern, measure distortion0.3-5mFails in sunlight
Time-of-flightMeasure light round-trip time0.5-10mMultipath interference
LiDARPulsed laser scanning1-200mExpensive, sparse

For tabletop manipulation (the Apple SPG use case), structured light (Intel RealSense D435) or stereo gives dense depth at 30fps with sub-millimeter accuracy at close range.

The perception pipeline for manipulation:

RGB-D Image
640x480 color + depth at 30fps
Point Cloud
Project depth to 3D using camera intrinsics
Plane Removal
RANSAC removes table plane — what's left is objects
Segmentation
DBSCAN clusters remaining points into objects
Pose Estimation
FoundationPose / PoseCNN gives 6-DOF pose per object
Grasp Planning
AnyGrasp / Contact-GraspNet produces ranked grasp candidates

Latency budget: the entire pipeline must run in under 100ms to keep up with the control loop. Perception at 30Hz, planning at 10Hz, control at 1kHz.

CODE: Point Cloud Processing Pipeline

python
import open3d as o3d
import numpy as np

# 1. Load point cloud from depth image
pcd = o3d.io.read_point_cloud("scene.ply")

# 2. Downsample with voxel grid (0.005m = 5mm voxels)
pcd = pcd.voxel_down_sample(0.005)

# 3. Remove table plane with RANSAC
plane, inliers = pcd.segment_plane(distance_threshold=0.01,
                                    ransac_n=3, num_iterations=1000)
objects = pcd.select_by_index(inliers, invert=True)

# 4. Cluster objects with DBSCAN
labels = np.array(objects.cluster_dbscan(eps=0.02, min_points=50))
n_clusters = labels.max() + 1
print(f"Found {n_clusters} objects")

# 5. Get bounding box for each cluster
for i in range(n_clusters):
    cluster = objects.select_by_index(np.where(labels == i)[0])
    bbox = cluster.get_axis_aligned_bounding_box()
    center = bbox.get_center()
    extent = bbox.get_extent()
    print(f"Object {i}: center={center}, size={extent}")

DEBUG: Perception Pipeline Failures

Debugging tip: "Object detection misses small objects" → voxel downsample is too aggressive (0.01m removes objects smaller than 1cm). Reduce voxel size to 0.003m for small parts. Also check that DBSCAN's min_points isn't too high — a small object at 1m has very few depth pixels.
Debugging tip: "Pose estimation is noisy frame-to-frame" → depth sensor noise (1-3mm for RealSense). Apply temporal filtering: Kalman filter on the estimated pose, or average over 3-5 frames before committing. For grasp planning, a 2mm pose error is usually fine.
Interview tip: When asked about perception, show you understand the full pipeline from sensor to action. "For a bin-picking task: I use a top-down RealSense D435 at 640x480, convert to point cloud, RANSAC for plane removal, DBSCAN for segmentation, then FoundationPose for 6-DOF pose. The full pipeline runs in 60ms on an RTX 3070. For novel objects without a model, I use Contact-GraspNet which predicts grasps directly from the point cloud."

FRONTIER: Foundation Models for Robot Perception

SAM2 (Segment Anything Model 2) gives zero-shot segmentation — point at any object and get a perfect mask, no training needed. FoundationPose estimates 6-DOF pose from a single reference image, no CAD model required. SpatialVLM grounds language in 3D space: "pick up the red cup on the left" becomes a 3D coordinate. These foundation models are replacing the hand-crafted perception pipelines above with general-purpose perception that works out of the box.

Point Cloud Processing Visualization

A tabletop scene viewed from above. Points are colored by cluster after RANSAC plane removal and DBSCAN segmentation. Adjust voxel size to see resolution vs speed tradeoff.

Voxel size (mm)5
Interview question: Your robot has a depth camera looking down at a bin of parts. Small parts are missed by the object detector. Large parts have inaccurate pose estimates. Diagnose both issues.

Chapter 8: Sim-to-Real Transfer — SHOWCASE

This is THE critical skill. Most robot learning happens in simulation. Getting it to work on real hardware is where careers are made. The reality gap — the difference between simulated and real physics, perception, and dynamics — determines whether your policy transfers or falls flat.

CONCEPT: Sources of the Reality Gap

Gap TypeWhat's DifferentImpactFix
DynamicsFriction, inertia, joint damping, backlashWrong forces → wrong trajectoriesSystem identification + randomization
PerceptionRendering vs real images, lighting, texturesPolicy confused by visual differencesVisual domain randomization
ActuatorCommunication delays (5-20ms), backlash, saturationCommands arrive late → oscillation/instabilityAdd latency in sim, randomize delay
SensorNoise, bias, dropout, different FoVOverconfident policy → brittle behaviorAdd noise in sim

DESIGN: The Sim-to-Real Pipeline

1. Build Sim
Create environment, robot model, task objects in MuJoCo/Isaac
2. System ID
Record real robot trajectories, optimize sim params to match
3. Domain Randomization
Randomize physics (μ, mass, delay) + visuals (textures, lighting, camera)
4. Train in Sim
RL/BC policy over millions of randomized episodes
5. Zero-Shot Transfer
Deploy directly on real robot — does it work?
6. Fine-tune (if needed)
Collect real data, fine-tune policy, close remaining gap
↻ iterate

CODE: Domain Randomization Implementation

python
import numpy as np
import mujoco

class DomainRandomizer:
    """Randomize simulation parameters each episode for robust transfer."""
    def __init__(self, model):
        self.model = model
        # Store default values
        self.default_friction = model.geom_friction.copy()
        self.default_mass = model.body_mass.copy()
        self.default_damping = model.dof_damping.copy()

    def randomize(self):
        # Friction: multiply by uniform [0.5, 1.5]
        scale = np.random.uniform(0.5, 1.5, self.default_friction.shape)
        self.model.geom_friction[:] = self.default_friction * scale

        # Mass: +/- 20%
        scale = np.random.uniform(0.8, 1.2, self.default_mass.shape)
        self.model.body_mass[:] = self.default_mass * scale

        # Joint damping: multiply by [0.5, 2.0]
        scale = np.random.uniform(0.5, 2.0, self.default_damping.shape)
        self.model.dof_damping[:] = self.default_damping * scale

        # Actuator delay: 0-20ms (added in step function)
        self.action_delay = np.random.randint(0, 21)  # ms

        # Observation noise: Gaussian with std 0-0.02 rad
        self.obs_noise_std = np.random.uniform(0, 0.02)

        # Recompute derived quantities after changing parameters
        mujoco.mj_setConst(self.model, mujoco.MjData(self.model))

    def add_obs_noise(self, obs):
        return obs + np.random.randn(*obs.shape) * self.obs_noise_std

# Usage: call randomizer.randomize() at each episode reset
# The policy trained over thousands of randomized envs
# learns to be robust to parameter variation

DEBUG: Sim-to-Real Failures (The Systematic Approach)

Debugging tip: "Policy works in sim, fails on real" → systematic diagnosis: (1) Record real trajectories of joint positions, velocities, torques. (2) Replay the same commands in sim. (3) Compare trajectories. If joints track differently → dynamics gap (recalibrate inertia, friction, damping). If same commands produce different contact behavior → contact model gap. If visual observations differ → perception gap. Fix the dominant gap first.
Debugging tip: "Policy is robust in sim but brittle on real" → domain randomization range is too narrow. The real world is outside your randomization bounds. Widen the ranges. If that degrades sim performance, you need better sim fidelity (system identification) so the randomization center is closer to reality.
Interview tip: When asked about sim-to-real, describe your systematic debugging process: "I compare sim vs real trajectories along three axes: dynamics (joint tracking error), perception (feature similarity), and timing (latency distribution). I plot each over time to identify the dominant gap. Then I fix that gap either by improving sim fidelity (system ID) or increasing robustness (domain randomization). I never just 'crank up randomization' — I diagnose first."

Worked Example: Diagnosing the Reality Gap

You trained a reaching policy in MuJoCo with 95% success. On the real Franka Panda, it achieves 35%. Here's the systematic debugging trace:

StepActionFindingFix
1Record real joint trajectories at 1kHzJoint 4 lags by 15ms compared to sim replayAdd 15ms delay to sim actuator model
2Compare contact forces during graspReal friction ~40% lower than sim defaultSystem ID: set μ=0.6 (was 1.0), add DR ±30%
3Check joint tracking RMSE over trajectoryJoints 5-6 (wrist) have 2x error of joints 1-3Wrist joint damping was wrong. Recalibrate from free-motion data.
4Re-train with improved sim + wider DRReal success: 78%Remaining gap: observation noise on joint velocities
5Add velocity observation noise (std=0.01 rad/s)Real success: 87%Ship it. Remaining 13% is edge cases (unusual object poses).

The key insight: each fix addressed a specific measured gap, not a generic "add more randomization." System identification (steps 1-3) moved the sim center closer to reality. Domain randomization (steps 4-5) covered the remaining uncertainty. This ordered approach converges in 3-5 iterations, while blind DR tuning can take 20+ iterations.

FRONTIER: Closing the Gap Automatically

Auto-DR (Automatic Domain Randomization) uses a secondary RL agent to learn the randomization ranges that maximize transfer. Instead of hand-tuning "friction ∈ [0.5, 1.5]", the system discovers the right ranges automatically. Real-to-Sim-to-Real scans the real environment (with NeRF or 3D Gaussian Splatting), builds a digital twin, trains a policy in the twin, and deploys. Foundation models (pre-trained visual features like DINOv2) transfer better than raw pixels because they've already learned to be invariant to visual domain shifts.

Sim-to-Real Transfer Dashboard (SHOWCASE)

Left: simulated arm reaching a target. Right: "real" arm (same task + noise/delay). Adjust domain randomization range and reality gap. Watch success rates change. Crank up the gap, then crank up DR to recover.

Reality Gap0.30
DR Range0.20
Actuator Delay (ms)10
Interview question: Your RL policy achieves 95% success in simulation but only 20% on the real robot. Describe your systematic debugging process.
💥 Break-It Lab What Dies When Simulation Parameters Are Wrong? ✓ ATTEMPTED
A robot reaching policy is trained in MuJoCo. The sim is configured with default parameters. Toggle each failure mode to see how the policy breaks on the real robot.
Too Large Timestep (dt = 20ms) ACTIVE
Failure mode: With dt=20ms and stiff contact (k=50,000), the CFL-like stability condition dt < 2√(m/k) = 9ms is violated. The simulation becomes unstable at contact: objects penetrate each other, forces blow up exponentially, the robot arm "explodes" after touching any surface. Energy grows each step: En+1 = En · (1 + dt2k/m). After 50 steps: energy is 104× initial — catastrophic.
Wrong Friction Model (Sliding Instead of Sticking) ACTIVE
Failure mode: Sim uses μ=0.3 (low friction). Real robot fingers have μ=0.9 (rubber). Policy learns to grip HARD (compensating for low sim friction). On real robot: excessive grip force crushes fragile objects. Conversely, if sim has μ=1.2 but real has μ=0.4: policy barely grips → objects slip. The friction model also matters: point contact vs full friction cone changes grasping stability entirely.
No Domain Randomization ACTIVE
Failure mode: Policy memorizes exact sim dynamics. Any deviation causes failure. Real robot has +10% heavier object: policy undershoots the lift height by 10%. Real camera is 2cm lower: policy misses objects by 2cm. Without DR, the policy has zero generalization margin. Adding DR with ±30% on all parameters forces the policy to be robust, typically recovering 50-70% of the lost performance.
🔨 Derivation Reward Shaping: When Does It Preserve the Optimal Policy? ✓ ATTEMPTED

In simulation, we often add "shaping rewards" to help RL learn faster: reward for moving toward the goal, penalty for collisions, bonus for grasping. But naive reward shaping can change the optimal policy!

Your task: (1) Prove that potential-based reward shaping F(s, s') = γΦ(s') − Φ(s) preserves the optimal policy for any potential function Φ. (2) Give an example of non-potential-based shaping that creates a suboptimal fixed point. (3) For a reaching task, design a good shaping potential Φ(s).

With shaped reward R' = R + F, the shaped Q-function Q'(s,a) satisfies Q'(s,a) = Q(s,a) + (something). If that "something" is the same for all actions at state s, then argmaxa Q'(s,a) = argmaxa Q(s,a) — the optimal policy is unchanged. Potential-based shaping adds γΦ(s') − Φ(s), which telescopes across a trajectory.
For a trajectory τ = (s0, ..., sT): ∑ γtF(st, st+1) = ∑ γt[γΦ(st+1) − Φ(st)] = γT+1Φ(sT+1) − Φ(s0). This only depends on the start and end states, not the path. Since all policies start at s0 and Φ(s0) is constant, the shaped return differs from the original by a constant (γT+1Φ(sT+1)). The optimal policy is the same as under the original reward.
Consider F(s,a) = -0.1 (constant penalty per step). This is NOT potential-based (it depends on neither s nor s'). It makes the agent prefer shorter trajectories. If the optimal policy takes a long detour to avoid an obstacle, this shaping could make a shorter (suboptimal, collision-prone) path look better. Any shaping that can't be written as γΦ(s') - Φ(s) risks changing the optimal policy.

Proof that PBRS preserves optimality:

Define Q'(s,a) = E[∑ γt(Rt + Ft) | s0=s, a0=a]. Expanding F = γΦ(s') - Φ(s):

Q'(s,a) = Q(s,a) + E[∑ γt(γΦ(st+1) - Φ(st))] = Q(s,a) - Φ(s) + γΦ(s)

The term -Φ(s) is independent of action a. So argmaxa Q'(s,a) = argmaxa Q(s,a). ■

Good reaching potential: Φ(s) = -c · ||gripper_pos - target_pos||. This gives dense reward for approaching the target. It's potential-based (function of state only), so it provably doesn't change the optimal policy, but it provides strong learning signal.

The key insight: In sim, you have full access to state, so you can compute any Φ(s). Use potential-based shaping aggressively: distance-to-goal, joint-limit proximity, orientation alignment. These accelerate learning 10-100x without changing what the agent ultimately learns. Non-potential shaping (alive bonuses, step penalties) is dangerous because it DOES change the optimal policy.

🏗 Design Challenge You're the Architect: Simulation Pipeline for Dexterous Hand ✓ ATTEMPTED
Your team needs to train a 24-DOF dexterous hand (Shadow Hand) to reorient objects using only fingertip contact. The policy must transfer to real hardware. Design the complete simulation pipeline.
DOFs
24 actuated joints + 6D object pose
GPU Budget
8x A100 GPUs for 1 week
Contact Richness
5 fingertips x object = 5+ simultaneous contacts
Transfer Target
≥70% success zero-shot on real hardware
Objects
Novel objects not seen in training
Vision
Wrist-mounted camera for object pose
1. Physics engine: MuJoCo (CPU, accurate contact) vs Isaac Sim (GPU, 10,000+ parallel envs). How does the parallelism vs accuracy tradeoff affect your choice?
2. Observation space: proprioception only? Add fingertip force sensing? Add wrist camera? What's the sim-to-real gap for each modality?
3. Domain randomization strategy: what parameters? What ranges? How do you set ranges without real-robot data?
4. Rendering pipeline: do you need photorealistic rendering for the wrist camera? Or can you use learned visual features that transfer?
5. Curriculum: start with easier objects (cube) and progress to harder (sphere, cylinder)? Or train on all simultaneously?

OpenAI's approach (2019 Rubik's Cube):

Engine: MuJoCo (not GPU-parallelized at the time), but running 6144 parallel CPU workers. Total: ~13,000 CPU years of experience in months of wall-clock time. Today: MuJoCo MJX on GPU gives 10,000+ envs on one A100.

Observations: Proprioception (joint angles + fingertip forces) for the actor. The critic sees privileged info (true object pose). This "asymmetric actor-critic" lets the critic learn fast while the actor only uses transferable signals.

DR Strategy: Automatic Domain Randomization (ADR). Start narrow, widen each parameter until performance drops below threshold, then hold. 37 randomized parameters including: friction (×0.5-2.0), mass (×0.5-2.0), joint damping (×0.3-3.0), actuator PD gains (×0.7-1.3), observation noise (0-5%), action delay (0-2 steps), gravity (±0.5 m/s2).

Vision: Separate pose estimation network (not end-to-end). Real camera → object pose estimate → feed to policy. This isolates the vision sim-to-real gap from the control sim-to-real gap.

Curriculum: Face rotation curriculum. Start by randomizing only the initial face states needed (1 rotation from solved). Gradually increase to full scrambles. Without curriculum: the policy never discovers a solve from distant states.

Key insight: The MOST important decision was separating vision and control. Trying to learn end-to-end (pixels to joint torques) in sim would require photorealistic rendering + massive domain randomization on visuals. By using a separate pose estimator, they could train the manipulation policy on clean state (proprioception + estimated pose), dramatically reducing the sim-to-real gap.

💻 Build It Implement Vectorized Environment Step ✓ ATTEMPTED
Implement a vectorized environment that steps N parallel MuJoCo environments simultaneously. This is the core infrastructure for training RL at scale. The key challenge: environments that terminate (done=True) must auto-reset without breaking the batch.
python class VectorizedEnv: def __init__(self, env_fn, num_envs: int): """Create num_envs parallel environments.""" ... def reset(self) -> np.ndarray: """Reset all envs. Returns obs: (num_envs, obs_dim).""" ... def step(self, actions: np.ndarray) -> tuple: """ Step all envs with batched actions. Auto-reset done envs (return first obs of new episode). Returns: obs, rewards, dones, infos All shapes: (num_envs, ...) """ ... def _auto_reset(self, idx: int) -> np.ndarray: """Reset env at index idx, return new obs.""" ...
Test case
venv = VectorizedEnv(lambda: make_reach_env(), num_envs=4)
obs = venv.reset() # shape: (4, obs_dim)
actions = np.random.randn(4, 7) # 7-DOF actions
obs, rew, done, info = venv.step(actions)
assert obs.shape == (4, obs_dim) # even if some envs reset
assert 'terminal_obs' in info[0] if done[0] else True
When an env finishes (done=True), you auto-reset it and return the first observation of the new episode. But the RL algorithm needs the LAST observation of the finished episode (for value bootstrapping: V(s_terminal) in GAE computation). Solution: store the terminal observation in the info dict before resetting. This is what Stable Baselines3's VecEnv does: info[i]['terminal_observation'] = final_obs.
python
import numpy as np
from multiprocessing import Process, Pipe

class VectorizedEnv:
    def __init__(self, env_fn, num_envs):
        self.num_envs = num_envs
        self.envs = [env_fn() for _ in range(num_envs)]
        # Get obs/action dims from first env
        self.obs_dim = self.envs[0].observation_space.shape[0]

    def reset(self):
        obs = np.stack([env.reset()[0] for env in self.envs])
        return obs  # (num_envs, obs_dim)

    def step(self, actions):
        obs_list, rew_list, done_list, info_list = [], [], [], []
        for i, (env, act) in enumerate(zip(self.envs, actions)):
            obs, rew, terminated, truncated, info = env.step(act)
            done = terminated or truncated
            if done:
                # Store terminal obs BEFORE reset
                info['terminal_obs'] = obs.copy()
                obs = self._auto_reset(i)
            obs_list.append(obs)
            rew_list.append(rew)
            done_list.append(done)
            info_list.append(info)
        return (np.stack(obs_list), np.array(rew_list),
                np.array(done_list), info_list)

    def _auto_reset(self, idx):
        obs, _ = self.envs[idx].reset()
        return obs
Bonus challenge: Make this truly parallel using multiprocessing (each env in a subprocess with Pipe communication). The sequential version above is fine for fast envs (MuJoCo at 10,000+ steps/sec) but too slow for rendering-heavy envs (Isaac Sim).
🔗 Pattern Recognition
The Simulator IS a World Model
Robotics Simulation (This Lesson)
MuJoCo: f(s, a) → s' via physics equations.
Known dynamics, exact gradients (MJX).
Sim = perfect model (for the sim world).
Model-Based RL
Learned model: fθ(s, a) → s' via neural net.
Approximate dynamics, learned from data.
Model = imperfect sim trained on real data.
Robot Learning Lesson

A physics simulator and a learned world model are the SAME abstraction: both predict s' = f(s, a). The difference is origin (hand-crafted vs learned) and accuracy (exact for sim physics vs approximate for real physics). MBRL's key insight: when you can't build an accurate simulator (because the real dynamics are unknown), learn one from data. The simulator lesson teaches you what a "good model" looks like; the MBRL lesson teaches you how to learn one.

If MuJoCo is already a world model, why bother learning a neural network model? Under what conditions does a learned model outperform a physics engine?

🔗 Pattern Recognition
Sim-to-Real Gap = Domain Adaptation
Simulation (This Lesson)
Source domain: sim distribution dsim.
Target domain: real distribution dreal.
Domain randomization: train on many source domains.
Robot Learning
Same pattern: train in sim, deploy on real.
The transfer bound: |Jreal - Jsim| ≤ f(εmodel, γ).
Closing the gap = reducing εmodel.
Robot Learning Lesson

Sim-to-real transfer is a special case of domain adaptation. The "domain" is the physics parameters. Domain randomization is equivalent to training on multiple source domains simultaneously — the same principle behind data augmentation in vision (random crops, color jitter = visual domain randomization). The insight transfers: any method that makes models robust to distribution shift (adversarial training, meta-learning, domain-invariant features) applies to sim-to-real.

In computer vision, ImageNet pretraining transfers to medical images despite the huge domain gap. Could a "PhysicsNet" (pretrained on diverse simulation data) transfer to novel robots the same way?

Checkpoint — Before you move on
You've just finished the sim-to-real chapter. Explain in your own words: why is system identification (making sim accurate) better than just cranking up domain randomization? Give a specific scenario where each approach wins.
✓ Gate cleared
Model Answer

System ID wins when: you have access to the real robot and can collect calibration data. If you know friction is wrong, measuring it and setting it correctly is strictly better than randomizing over a wide range. A centered, narrow randomization (sys-ID + small DR) produces better policies than a wide, uncentered one. Example: calibrating joint damping coefficients by recording free-motion trajectories.

DR wins when: you CAN'T know the true parameters (novel objects with unknown mass, varying environmental conditions, or you simply don't have real robot access yet). Also for parameters that genuinely vary in deployment (different objects, different table surfaces, lighting changes). Example: training a bin-picking policy that must handle objects from 0.1kg to 2.0kg.

The synthesis: Best practice is BOTH: system ID to center your randomization at reality, then DR to cover residual uncertainty. This gives the narrowest effective randomization (highest policy performance) while still ensuring transfer.

Chapter 9: Robot Learning — RL, Imitation, Diffusion Policies

CONCEPT: Three Paradigms for Teaching Robots

Reinforcement learning (RL) learns from trial and error. The robot tries random actions, receives a reward signal, and gradually learns which actions lead to high rewards. For robotics, RL happens in simulation (millions of episodes, each taking seconds) because real robots are too slow and expensive. The dominant algorithm is PPO (Proximal Policy Optimization) — stable, scalable, and well-understood.

Imitation learning learns from demonstrations. A human teleoperates the robot (or records hand motions), and the policy learns to map observations to actions that mimic the expert. Behavior cloning (BC) is the simplest form: supervised learning on (observation, action) pairs. The problem: compounding errors. A small deviation from the demonstration puts the robot in a state it never saw during training, leading to further deviation. DAgger fixes this by interactively querying the expert in states the policy visits.

Diffusion policies model the action distribution as a denoising diffusion process. Starting from Gaussian noise, the policy iteratively denoises to produce an action trajectory. The key advantage: multi-modality. If there are multiple valid ways to fold a shirt (left-first or right-first), BC averages them (producing garbage), while diffusion policies sample from one mode or the other.

Action chunking predicts N future actions at once (typically 8-32 timesteps). This reduces compounding errors in BC and provides temporal consistency in diffusion policies. ACT (Action Chunking with Transformers) combines a transformer encoder with a CVAE decoder to produce action chunks.

DESIGN: When to Use What

MethodWhen to UseData NeedStrengthWeakness
RL (PPO)Good sim + clear rewardNone (self-play)Discovers novel strategiesReward engineering, sample inefficient
BCExpert demos available, unimodal task50-200 demosSimple, fast trainingCompounding errors, mode averaging
DAggerExpert available for interactive queriesInteractiveFixes compounding errorsRequires online expert
DiffusionMulti-modal tasks, high precision50-200 demosMulti-modality, precisionSlow inference (10-50 denoising steps)
ACTBimanual, contact-rich tasks50-100 demosTemporal consistencyFixed chunk size limits reactivity

DESIGN: The Reward Engineering Problem

Designing good reward functions is the hardest part of RL for robotics. A sparse reward (+1 when the task succeeds, 0 otherwise) is easy to define but almost impossible to learn from — the robot must randomly stumble into success before it can learn anything. A dense reward provides signal at every timestep but can lead to reward hacking (the robot finds unexpected shortcuts that maximize reward without actually completing the task).

The practical approach is reward shaping — adding intermediate rewards that guide the robot toward the goal without specifying exactly how to get there:

python
def reaching_reward(obs, goal):
    dist = np.linalg.norm(obs["ee_pos"] - goal)
    # Dense: negative distance (closer = better)
    r = -dist
    # Bonus for getting close (but not for staying close)
    if dist < 0.05: r += 1.0
    # Penalty for high velocity at contact (don't slam into things)
    if dist < 0.1: r -= 0.1 * np.linalg.norm(obs["ee_vel"])
    # Penalty for joint limits (stay in safe range)
    r -= 0.01 * np.sum(np.maximum(0, np.abs(obs["q"]) - 0.9 * obs["q_lim"]))
    return r

Common reward engineering pitfalls: (1) Reward for being close to the goal instead of reaching it → the robot hovers near the goal forever. (2) No penalty for collisions → the robot takes the shortest path through obstacles. (3) Reward for grasping force → the robot crushes delicate objects. (4) No time penalty → the robot moves infinitely slowly to maximize precision.

CODE: PPO Training Loop for a Reaching Task

python
import torch
import torch.nn as nn
import numpy as np

class Policy(nn.Module):
    def __init__(self, obs_dim, act_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(obs_dim, 64), nn.Tanh(),
            nn.Linear(64, 64), nn.Tanh())
        self.mean = nn.Linear(64, act_dim)
        self.log_std = nn.Parameter(torch.zeros(act_dim))

    def forward(self, obs):
        h = self.net(obs)
        mean = self.mean(h)
        std = self.log_std.exp()
        return torch.distributions.Normal(mean, std)

def ppo_update(policy, optimizer, states, actions, advantages, old_log_probs,
               clip_eps=0.2, epochs=4):
    for _ in range(epochs):
        dist = policy(states)
        log_probs = dist.log_prob(actions).sum(-1)
        ratio = (log_probs - old_log_probs).exp()
        # PPO clipped objective
        surr1 = ratio * advantages
        surr2 = ratio.clamp(1 - clip_eps, 1 + clip_eps) * advantages
        loss = -torch.min(surr1, surr2).mean()
        optimizer.zero_grad(); loss.backward(); optimizer.step()

# Reward function for reaching: -distance to target
def reward(end_effector_pos, target_pos):
    dist = np.linalg.norm(end_effector_pos - target_pos)
    if dist < 0.02: return 10.0   # bonus for reaching target
    return -dist  # shaped reward: closer = better

DEBUG: Learning Failures

Debugging tip: "RL policy doesn't converge" → check reward function. Sparse rewards (only +1 at goal) require enormous exploration. Add reward shaping: -distance to target as a dense signal. Also check that the observation space is informative — does the policy see the target position?
Debugging tip: "BC policy drifts after a few seconds" → compounding errors. The policy sees slightly different states than the demonstrations, makes slightly wrong actions, which leads to even more unfamiliar states. Fix: DAgger (interactively query expert), action chunking (predict multiple steps, less drift per query), or diffusion policy (better mode coverage).
Interview tip: When asked about robot learning, show you understand the tradeoffs: "For a structured task with a clear reward, I use RL in simulation with PPO — it discovers strategies a human wouldn't demonstrate. For contact-rich tasks like assembly where reward engineering is hard, I use imitation learning from teleoperation demos. If the task has multiple valid strategies (like folding), I use a diffusion policy to capture the multi-modal distribution."

FRONTIER: Foundation Policies

RT-2 (Robotic Transformer 2) is a VLA (Vision-Language-Action) model: a 55B parameter model that takes images and language instructions and outputs robot actions. Trained on internet-scale data + robot demonstrations. pi-0 uses flow matching (continuous-time analog of diffusion) for action generation, achieving state-of-the-art dexterous manipulation. Octo is an open-source generalist policy pre-trained on the Open X-Embodiment dataset (1M+ robot episodes). LeRobot (Hugging Face) provides a unified framework for training and deploying robot learning algorithms.

RL Training Visualization: Reaching Task

A 2D point agent learns to reach a target. Watch the policy improve over episodes. Toggle reward shaping vs sparse reward to see the difference. The reward curve shows learning progress.

Interview question: You have 100 teleoperation demonstrations of a folding task. Some demonstrate left-first folding, others right-first. Which learning algorithm handles this multi-modality best and why?

Chapter 10: Systems Integration — ROS2, Middleware, Testing

CONCEPT: ROS2 Architecture

ROS2 (Robot Operating System 2) is the middleware that connects all robot software components. It's not an operating system — it's a communication framework. Each component runs as a node (a process) that communicates via topics (pub/sub), services (request/reply), or actions (long-running tasks with feedback).

Key concepts:

ConceptPatternUse CaseLatency
TopicPub/Sub (N-to-N)Sensor streams, state broadcasts~1ms (intra-process)
ServiceRequest/Reply (1-to-1)"Plan a path", "Get parameter"~5ms
ActionGoal + Feedback + Result"Move to pose" (takes seconds)Varies
ParameterKey-value store per nodeRuntime configurationImmediate

QoS (Quality of Service) profiles determine message delivery guarantees. For sensor data: BEST_EFFORT (drop old messages if subscriber is slow). For control commands: RELIABLE (guarantee delivery). For safety signals: RELIABLE + TRANSIENT_LOCAL (new subscribers get last message). Getting QoS wrong is one of the most common causes of mysterious ROS2 bugs.

DESIGN: The Real-Time Control Stack

RTOS Layer (1kHz)
Real-time control loop: read encoders, compute torque, write to actuators. Must never miss a deadline.
↓ joint_states topic
ROS2 Layer (10-100Hz)
Perception, planning, high-level control. Soft real-time — occasional latency spikes OK.
↓ goal/command action
Behavior Layer (1-10Hz)
State machines, behavior trees, task sequencing. "Pick up cup, then pour."

The 1kHz control loop runs on a real-time OS (PREEMPT_RT Linux or Xenomai) in a dedicated thread. It communicates with ROS2 via lock-free shared memory. The ROS2 nodes run in a separate executor. This separation is critical: ROS2 is NOT real-time by default. A garbage collection pause in a Python node must never affect the safety-critical control loop.

CODE: ROS2 Control Node (C++)

cpp
// ROS2 lifecycle node: subscribes to joint states, publishes torques
#include <rclcpp/rclcpp.hpp>
#include <sensor_msgs/msg/joint_state.hpp>
#include <std_msgs/msg/float64_multi_array.hpp>

class ArmController : public rclcpp::Node {
public:
  ArmController() : Node("arm_controller") {
    // Subscribe to joint states (RELIABLE for control)
    auto qos = rclcpp::QoS(10).reliable();
    sub_ = create_subscription<sensor_msgs::msg::JointState>(
      "joint_states", qos,
      [this](const sensor_msgs::msg::JointState::SharedPtr msg) {
        // PD control at 1kHz
        std::vector<double> torques(msg->position.size());
        for (size_t i = 0; i < torques.size(); i++) {
          double err = target_[i] - msg->position[i];
          double derr = -msg->velocity[i];
          torques[i] = kp_ * err + kd_ * derr;
        }
        auto cmd = std_msgs::msg::Float64MultiArray();
        cmd.data = torques;
        pub_->publish(cmd);
      });
    pub_ = create_publisher<std_msgs::msg::Float64MultiArray>(
      "torque_commands", qos);
  }
private:
  double kp_ = 20.0, kd_ = 5.0;
  std::vector<double> target_ = {0, 0.5, -0.3, 0, 0, 0};
  // subscribers and publishers
  rclcpp::Subscription<sensor_msgs::msg::JointState>::SharedPtr sub_;
  rclcpp::Publisher<std_msgs::msg::Float64MultiArray>::SharedPtr pub_;
};

DEBUG: ROS2 Integration Issues

Debugging tip: "Messages arrive late" → QoS mismatch. If publisher uses RELIABLE and subscriber uses BEST_EFFORT (or vice versa), they can't communicate. Check with ros2 topic info -v /topic_name to see QoS settings of all publishers and subscribers. Also check DDS middleware — CycloneDDS vs FastRTPS have different default behaviors.
Debugging tip: "Controller is jittery" → not meeting real-time deadline. Use ros2 run [pkg] [node] --ros-args -p use_sim_time:=false and measure callback execution time. If callbacks take >1ms, you're missing the 1kHz deadline. Solutions: move to a real-time executor, reduce callback complexity, or use a separate real-time thread outside ROS2.
Interview tip: When discussing systems integration, show you understand the real-time constraint hierarchy: "The torque controller MUST run at 1kHz with <1ms jitter — I put it in a PREEMPT_RT thread with memory-locked, pre-allocated buffers. The planner runs at 10Hz as a ROS2 action server — latency spikes are OK because the controller interpolates. Perception runs at 30Hz with BEST_EFFORT QoS so slow frames are dropped rather than queued."

Worked Example: ROS2 Launch File for a Manipulation System

python
# launch/manipulation.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        # Camera driver (30Hz, BEST_EFFORT)
        Node(package="realsense2_camera", executable="realsense2_camera_node",
             parameters=[{"depth_module.profile": "640x480x30"}]),
        # Perception (subscribes to camera, publishes object_poses)
        Node(package="perception", executable="pose_estimator",
             parameters=[{"model": "foundation_pose", "gpu": 0}]),
        # Motion planner (action server, RELIABLE)
        Node(package="planning", executable="motion_planner",
             parameters=[{"algorithm": "rrt_connect", "timeout_ms": 500}]),
        # Controller (1kHz real-time, RELIABLE)
        Node(package="control", executable="impedance_controller",
             parameters=[{"rate": 1000, "stiffness": 1000.0}]),
        # Safety monitor (1kHz, RELIABLE + TRANSIENT_LOCAL)
        Node(package="safety", executable="force_monitor",
             parameters=[{"max_force_N": 30.0, "rate": 1000}]),
    ])

Key design decisions in the launch file: perception and planning are separate processes (crash isolation — if perception crashes, the controller still holds position). The safety monitor runs at 1kHz with TRANSIENT_LOCAL QoS so that new subscribers (like a UI) immediately get the last safety status. All parameters are exposed as ROS2 parameters for runtime tuning without recompilation.

FRONTIER: Foundation Models Changing the Stack

VLA models (Vision-Language-Action) like RT-2 and pi-0 are collapsing the traditional perception→planning→control pipeline into a single neural network. Instead of separate nodes for each, one model takes images and language instructions and outputs motor commands directly. This changes the ROS2 architecture: fewer nodes, simpler data flow, but new challenges around real-time inference (running a 7B model at 10Hz requires an onboard GPU).

ROS2 Node Graph Visualization

Click a node to highlight its data flow. Yellow = topics, blue = services. Thickness shows message rate. The controller node (red border) has a real-time constraint.

Interview question: Your robot system has a perception node (30Hz), a planner (10Hz), and a controller (1kHz). Design the ROS2 node graph with appropriate QoS settings.

Chapter 11: Full Pipeline Simulation — SHOWCASE #2

This is the capstone. An end-to-end robotics pipeline running in your browser: perception detects objects, the planner finds a path, the controller executes it, and the physics engine simulates the result. You control every stage and can break any part to see how failures cascade.

The Pipeline

Perception
Detect target position (with adjustable noise)
↓ target pose
Planning
RRT path to target (adjustable speed/quality)
↓ waypoints
Control
PD controller tracks waypoints (adjustable gains)
↓ torques
Physics
Semi-implicit Euler integration (adjustable timestep)
↻ repeat at each control step

Try Breaking It

The simulation starts with reasonable defaults that produce successful reaching. Your job is to understand each failure mode:

What to BreakHowWhat Happens
PerceptionCrank noise to maxTarget estimate jumps around, arm oscillates
PlanningReduce step size to minimumPlanner takes too long, arm waits
ControlSet P very high, D to zeroArm overshoots wildly and oscillates
PhysicsSet timestep to maximumSimulation becomes unstable, energy grows
EverythingAll sliders to extremesComplete system failure — now fix it
Interview tip: This simulation demonstrates the key insight: robotics systems fail at the interfaces between components. A perfect planner with a bad controller fails. A perfect controller with noisy perception fails. Staff engineers debug at the interface level, not within individual components. When asked "the robot isn't working," your first question is "which interface is broken?"
End-to-End Robotics Pipeline

A 2-link arm reaches for targets. Control every stage of the pipeline. Break it, then fix it. Metrics update in real time.

Perception Noise0.05
Control Kp25
Control Kd8.0
Physics dt (ms)2
Interview question: Your full robot pipeline (perception → planning → control → physics) works 90% of the time but fails unpredictably. How do you systematically diagnose the remaining 10% failures?
⚔ Adversarial: The Timestep-Reward Coupling
You're training an RL policy with reward = -1 per step until goal reached (negative step penalty). You decrease your sim timestep from 10ms to 1ms for better physics accuracy. Without changing anything else, what happens to the learned policy?
⚔ Adversarial: The GPU Parallelism Bottleneck
You're using Isaac Lab with 4096 parallel environments on an A100 GPU. Training a reaching policy works great (1M steps/second). You switch to a dexterous manipulation task with complex mesh objects. Steps/second drops to 50K. What's the bottleneck?

Chapter 12: Interview Arsenal

1. Cheat Sheet Table

Concept30-sec ExplanationKey EquationToolClassic PaperModern Paper
Rigid body dynamicsF=ma + τ=Iα with quaternion orientation. Semi-implicit Euler for stability.F=ma, τ=Iα+ω×(Iω)MuJoCoFeatherstone 2008Brax (Freeman 2021)
Contact resolutionDetect collision (AABB+GJK), resolve with constraint-based LCP or penalty forces.fn≥0, d≥0, fn·d=0MuJoCo, DrakeStewart & Trinkle 1996Dojo (Howell 2022)
FK / IKFK: joint angles→end-effector via DH chain. IK: inverse via Jacobian pseudoinverse.x=FK(θ), Δθ=J+ΔxKDL, PinocchioCraig 1989IKFlow (Ames 2022)
Motion planningFind collision-free path in C-space. RRT samples randomly, grows tree toward goal.P(find|∃) → 1 as n→∞OMPL, DrakeLaValle 1998 (RRT)Diffusion Policy (Chi 2023)
PID / ImpedancePID: proportional+integral+derivative error correction. Impedance: virtual spring-damper.τ=Kpe+Ki∫e+Kdros2_controlHogan 1985 (impedance)Residual RL (Johannink 2019)
PerceptionDepth → point cloud → segmentation → 6-DOF pose → grasp candidates.p = K-1[u,v,1]T·dOpen3D, SAM2PoseCNN (Xiang 2018)FoundationPose (Wen 2024)
Sim-to-realTrain in sim, transfer to real. Bridge reality gap via domain randomization + sys-ID.Randomize φ ~ U(lo, hi)Isaac LabTobin et al. 2017 (DR)Auto-DR (Akkaya 2019)
RL / BCRL: learn from reward in sim. BC: learn from demos. Diffusion: multi-modal action distributions.L = -E[A·logπ]LeRobot, Stable Baselines3PPO (Schulman 2017)pi-0 (Black 2024)

2. System Design Questions

Design 1: "Design a simulation platform for testing autonomous mobile robots."

Architecture: MuJoCo for physics (fast, accurate ground contact for wheeled/legged robots). Separate rendering pipeline: low-fidelity rasterization for RL training (fast), Isaac Sim for photorealistic testing (accurate). Environment generator: procedural floor plans with randomized obstacles, textures, lighting. Sensor simulation: simulated LiDAR (ray casting), RGB cameras (rendered), IMU (integrated acceleration + noise). API: Gym-compatible interface, each environment runs as independent process for parallelism. Testing hierarchy: unit tests (each component), integration (full loop at 10x real-time), regression (1000 random scenarios nightly). Key metric: sim-to-real gap measured as trajectory RMSE between sim and real robot on 50 benchmark paths.
Design 2: "Design a pick-and-place system from perception to execution."

Hardware: 7-DOF arm (Franka) + parallel gripper + Intel RealSense D435 (top-down). Perception: RGB-D → point cloud → RANSAC plane removal → DBSCAN clustering → FoundationPose for 6-DOF pose (60ms). Grasp planning: Contact-GraspNet produces ranked grasp candidates → filter by reachability (IK check) → collision check (10ms). Motion planning: bidirectional RRT in joint space → shortcut smoothing → time parameterization (100ms). Execution: impedance control during approach (low stiffness), position control during transit (high stiffness), force control during grasp (close until 10N threshold). Monitoring: force-torque sensor detects slip (re-grasp if force drops), depth camera verifies object in gripper. Cycle time: 4-6 seconds per pick. Error recovery: if grasp fails (no object in gripper after close), re-scan and replan.
Design 3: "Design a sim-to-real pipeline for dexterous manipulation."

Sim: MuJoCo with accurate hand model (24-DOF). System ID: record 50 real trajectories of simple grasps, optimize sim parameters (friction, damping, tendon stiffness) to minimize trajectory RMSE. Domain randomization: physics (friction ±40%, mass ±30%, actuator delay 0-20ms), perception (camera pose ±2cm, lighting color temp, texture randomization). Training: PPO with 4096 parallel envs on 8 GPUs, asymmetric actor-critic (critic sees privileged info like true object pose, actor sees only proprioception + wrist camera). Curriculum: start with large objects, gradually introduce smaller objects. Transfer: zero-shot deploy, measure success rate on 100 real trials. If <70%, identify dominant gap (replay real commands in sim, compare trajectories), fix via targeted sys-ID or wider DR. Target: 85%+ success on novel objects within 3 iterations.
Design 4: "Design the software stack for a robot that operates in human environments."

Safety layer (highest priority): force/torque monitoring on all joints, collision detection with human tracking (skeleton from depth camera), emergency stop if contact force > 10N or human enters workspace. Control: impedance control with low stiffness (200 N/m) when humans nearby, position control (2000 N/m) when workspace is clear. Perception: continuous 3D scene understanding (RGB-D + LiDAR), object tracking (6-DOF pose at 30Hz), human pose estimation (MediaPipe at 30Hz). Navigation: costmap with human-aware planning (social force model), keep 1m distance from humans. Task execution: behavior trees for task sequencing, action primitives (pick, place, pour, wipe) as reusable modules. ROS2 architecture: safety node (1kHz, RELIABLE), control node (1kHz, real-time thread), perception (30Hz, BEST_EFFORT), planner (10Hz), behavior (1Hz).

3. Coding Drills

Drill 1: Implement forward kinematics for a 3-link arm.
Given link lengths L1, L2, L3 and joint angles θ1, θ2, θ3, compute the end-effector (x, y) position. Return intermediate joint positions too (for drawing).

Key insight: Each joint position is a cumulative rotation + translation: pi = pi-1 + Li[cos(Σθ), sin(Σθ)]T.
Drill 2: Write an RRT planner in 2D.
Input: start, goal, list of circular obstacles. Output: collision-free path as list of waypoints. Must include: goal bias (10%), collision checking along edges, path extraction by tracing parents.

Key insight: Check collisions along the EDGE (interpolate points between nodes), not just at the new node. Use 10+ interpolation points per edge.
Drill 3: Implement a PID controller with anti-windup.
Input: error signal over time. Output: control signal. Must include: integral clamping, derivative filtering (low-pass on derivative term), output saturation.

Key insight: Anti-windup prevents the integral term from accumulating when the output is saturated. Without it, the integral "winds up" during saturation and causes massive overshoot when the error reverses.
Drill 4: Write domain randomization for a MuJoCo environment.
Input: MuJoCo model. Output: randomized model with varied friction, mass, damping, and observation noise. Must include: save/restore default values, recompute derived quantities after modification.

Key insight: After modifying model.body_mass, call mj_setConst() to recompute derived inertia and gravity compensation terms. Forgetting this causes subtle bugs where the modified mass is only partially applied.

4. Debugging Scenarios

Scenario 1: "Simulation explodes after 1000 steps."

Diagnosis: (1) Check energy over time — if it grows monotonically, it's an integration instability. (2) Check if it correlates with contact events — stiff penalty forces + large timestep = instability. (3) Fix: reduce dt (first), switch to semi-implicit Euler (second), switch to implicit integrator (third). If only contacts cause it, increase solver iterations or switch to MuJoCo's convex contact model. (4) Prevention: always log total system energy; set threshold for automatic early termination.
Scenario 2: "RL policy works in sim, fails on real."

Diagnosis: (1) Record real trajectories (joint pos/vel/torque at 1kHz). (2) Replay same commands in sim. (3) Compute per-joint RMSE over time. (4) If RMSE is high early: actuator model mismatch (latency, PD gains). (5) If RMSE grows over time: friction/damping mismatch (sim is too ideal). (6) If RMSE spikes at contact: contact model mismatch (friction, stiffness). (7) Fix dominant gap via system ID. (8) Widen domain randomization around the improved sim parameters. (9) Re-train and re-evaluate. Iterate until ≥80% real success.
Scenario 3: "Robot arm oscillates when making contact."

Diagnosis: (1) The position controller is too stiff for the contact situation. When the arm hits the surface, the controller tries to push through it (position error), then the contact pushes back, creating a limit cycle. (2) Fix: switch from position control to impedance control at contact. Reduce stiffness from 2000 N/m to 200 N/m. Add damping: B = 2√(K·M) for critical damping. (3) Better fix: use a hybrid position/force controller that switches to force control in the contact normal direction. (4) Prevention: always use impedance control for any task involving contact.
Scenario 4: "Motion planner can't find path through narrow gap."

Diagnosis: (1) Narrow passages in C-space are exponentially unlikely to be sampled by basic RRT. The probability of sampling inside a passage of width w in a d-dimensional space is proportional to wd — vanishingly small for 7-DOF. (2) Fix: use RRT-Connect (bidirectional reduces the effective passage length by half). (3) Use bridge-test sampling: sample pairs of points, and if both are in collision but the midpoint is free, add the midpoint as a roadmap node. (4) Use guided sampling: bias samples toward the workspace region near the gap. (5) For production: pre-build a PRM roadmap, query it in <1ms for any start-goal pair.

5. Classical vs Learned Table

ComponentClassical ApproachLearned ApproachWhen to Switch
PlanningRRT, PRM, A* — complete, reliable, well-understoodDiffusion Policy, MPNet — fast, handles multi-modalityWhen planning time >100ms or task is multi-modal
ControlPID, impedance, computed torque — stable, analyzableRL policy, residual learning — handles unknown dynamicsWhen model is uncertain or task is too complex to hand-code
PerceptionRANSAC, ICP, feature matching — interpretable, no training dataFoundationPose, SAM2 — zero-shot, handles novel objectsWhen objects are novel or scenes are cluttered
ContactLCP, penalty, hydroelastic — physics-based, differentiableLearned contact models — fast, match real dataWhen real contact is too complex to model analytically
Sim-to-realSystem ID + manual DR — interpretable, controllableAuto-DR, real-to-sim — automatic, scalableWhen manual tuning doesn't close the gap

6. Recommended Reading

Book: Robotics: Modelling, Planning and Control by Siciliano, Sciavicco, Villani, Oriolo. The bible for kinematics, dynamics, and control. Chapters 3-7 cover everything in this lesson's Ch 1, 4, 5, 6.

Papers:

PaperWhy Read It
Todorov et al. "MuJoCo" (2012)The physics engine paper — explains the convex contact formulation
Tobin et al. "Domain Randomization" (2017)The original DR paper for sim-to-real visual transfer
Chi et al. "Diffusion Policy" (2023)State-of-the-art imitation learning for multi-modal tasks
Black et al. "pi-0" (2024)Flow matching for dexterous manipulation — the frontier
Wen et al. "FoundationPose" (2024)Zero-shot 6-DOF pose estimation without CAD models

Repos:

RepoWhat It Is
google-deepmind/mujocoMuJoCo physics engine — the standard for robotics sim
RobotLocomotion/drakeDrake — mathematically rigorous robotics toolbox
NVIDIA-Omniverse/IsaacLabIsaac Lab — GPU-accelerated robot learning
huggingface/lerobotLeRobot — end-to-end robot learning framework
ARISE-Initiative/robosuiterobosuite — benchmarks for robot manipulation in MuJoCo

7. Related Lessons

Continue your preparation with these related Gleam lessons:

LessonConnection
Classical SLAMLocalization and mapping for mobile robots — the perception backbone for navigation
Modern SLAMNeural SLAM, NeRF-based mapping — frontier perception for robotics
Multiview GeometryCamera models, stereo vision, 3D reconstruction — foundations of robot perception
ML PerformanceGPU optimization, profiling, inference — making robot learning models run at real-time speed
Coding Drill Flashcards

Practice whiteboard-style. Work through the problem before revealing the answer.

Final interview question: You're designing a robot manipulation system for Apple SPG. You need to go from concept to deployed hardware in 6 months. What's your technical plan?