Extending RRT and PRM to robots with real physics — systems where you can’t steer between two states with a straight line, you must integrate dynamics forward in time.
In Chapter 5, the RRT extended toward a random sample by drawing a straight line in C-space and stepping along it. Clean. Elegant. But what does “a straight line in C-space” mean for a car?
A car at position (0, 0) heading north can’t move sideways to (1, 0). It has to arc there — turn right, drive forward, possibly reverse. The straight line between the two configurations passes through states the car cannot physically occupy during the transition. The car’s differential constraint — the no-slip condition that rules out sideways motion — makes straight-line interpolation in C-space physically meaningless.
This is the core problem of Chapter 14: RRT and PRM assumed we could connect any two nearby states with a feasible local path. Once differential constraints enter the picture, that assumption breaks. We need a fundamentally different way to extend our tree — one that respects the physics.
The teal point is a car at the origin facing up (θ = 90°). The orange point is a goal configuration. The red dashed line is the straight-line C-space interpolation — it passes through orientations the car can’t reach continuously. The green arc is a true feasible trajectory obtained by integrating &ẋ; = f(x,u). Click “New Goal” to see different examples.
Everything in this chapter follows from one insight: instead of interpolating between states, simulate forward. Pick a random state, find the nearest tree node, apply a control input for a short duration Δt, integrate &ẋ; = f(x, u) to get a new state, add it to the tree. That’s kinodynamic planning in two sentences.
Chapter 4 gave us C-space: every configuration q of the robot is a point, and Cfree is the navigable region. But for a dynamical system like a drone or a car with nonzero mass, knowing where the robot is isn’t enough. You also need to know how fast it’s going. Without velocity, you can’t predict the future.
The state space X is C-space extended with its derivatives. For a unicycle with configuration q = (px, py, θ), the state is still just x = q — configuration is enough because the controls directly set velocity (first-order system). But for a car with mass, the state is x = (px, py, θ, v) where v is the current speed — because the control (throttle) changes acceleration, not velocity directly. And for a robot arm, the state is x = (q, q̇) — joint angles and joint velocities — because torque changes acceleration.
When we extend to state space, the obstacle region Xobs grows. In C-space, we had Cobs — configurations where the robot collides. In state space, Xobs includes not only colliding configurations but also any state from which collision is unavoidable regardless of control input. A car heading at a wall at 100 m/s with only 10 m to stop is in Xobs even though it isn’t touching the wall yet.
| Concept | C-space version | State-space version |
|---|---|---|
| Free space | Cfree: no collision | Xfree: no collision AND stopping possible |
| Obstacle | Cobs: robot intersects obstacle | Xobs: collision unavoidable from this state |
| State | Configuration q only | x = (q, q̇) for second-order systems |
| Dimension | n (degrees of freedom) | 2n for second-order, n for first-order |
This larger dimension is both the challenge and the reason we need sampling. Combinatorial planners that work in 3D C-space become intractable in 6D state space. Sampling-based methods scale gracefully — the fundamental RRT loop works the same whether the space is 3D or 12D.
From a state x, applying control u for duration Δt produces a new state. The set of all states reachable from x in exactly Δt is the reachable set R(x, Δt). It is the analog of “the disk of radius vΔt around x” from Chapter 5, but shaped by dynamics rather than geometry.
For a unicycle (first-order, unconstrained in v and ω), R(x, Δt) looks like a disk — similar to the unconstrained particle. But for a car with a maximum steering angle δmax and bounded speed [−vmax, vmax], the reachable set is a curved banana shape: it extends forward and backward, can arc left or right, but cannot point sideways.
Both start at the center heading right. Blue dots are the reachable set of a free particle (any direction). Orange dots are the reachable set of a car with max steering angle 35° and forward/backward speed ≤ 1. Use the slider to change Δt and watch the shapes evolve.
The forward reachable set from x at time t is the set of all states you can reach from x in time t. Planning asks: does the goal state lie in the forward reachable set of the start? Equivalently, we can ask about the backward reachable set of the goal: all states from which the goal can be reached. A feasible plan exists if and only if these two sets intersect.
Computing these sets exactly is intractable for most systems. Sampling-based planning avoids this: instead of computing R(x, Δt) analytically, the RRT samples from it by picking a random control u and integrating forward. The samples approximate the reachable set without needing its closed form.
Kinodynamic planning is the marriage of kinematics (where the robot is) and dynamics (how fast it’s moving) in a single planning problem. The term was coined by Donald, Xavier, Canny, and Reif in 1993. The goal: find a trajectory x(t) and control sequence u(t) that steers the robot from xinit to a neighborhood of xgoal while satisfying &ẋ; = f(x,u), staying in Xfree, and respecting control bounds U.
The critical algorithmic change from Chapter 5 RRT to kinodynamic RRT is the extend step. In Ch 5:
# Ch 5 RRT extend (C-space, no dynamics) def extend(tree, x_rand): x_near = nearest(tree, x_rand) direction = x_rand - x_near x_new = x_near + step_size * direction / norm(direction) # straight line if collision_free(x_near, x_new): tree.add(x_new, parent=x_near)
In kinodynamic RRT:
# Ch 14 Kinodynamic RRT extend (state space, dynamics) def extend(tree, x_rand): x_near = nearest(tree, x_rand) # find nearest in state space u = sample_control(U) # pick a random control x_new = simulate(x_near, u, dt) # integrate ODE forward dt seconds if collision_free(x_near, x_new): # check state-space path segment tree.add(x_new, parent=x_near, control=u, duration=dt)
The simulate(x, u, dt) call integrates &ẋ; = f(x, u) for duration Δt using Euler integration (or Runge-Kutta for accuracy). The result is a state that is physically reachable from xnear by applying control u. No straight-line extrapolation, no approximate interpolation — actual physics.
Finding the nearest node in state space requires a metric. This is subtler than it sounds. State space mixes position (meters) with velocity (m/s) with angle (radians). Naively computing Euclidean distance in the combined space gives nonsensical results because the units don’t match.
The weights wi normalize dimensions: typically divide each component by its maximum range. For a car with position in [0, 10]m and speed in [−2, 2]m/s: wx = 1/10, wv = 1/4. Angle components need special handling for wraparound (θ and θ + 2π are the same configuration).
This is the payoff. Watch a kinodynamic RRT build a tree for a car-like robot with bounded steering angle. The car’s state is x = (px, py, θ) and its differential equation is the standard bicycle model:
where v is speed (fixed at 1 m/s here), L is wheelbase (1 m), and δ ∈ [−35°, 35°] is the steering angle — the only control. The car can’t steer more sharply than 35°; this is the physical constraint that makes planning nontrivial.
Teal circle = start (heading right). Orange circle = goal region. Blue arcs = tree edges (each a short simulated trajectory). Gray rectangles are obstacles. Press Step to add one node, Run to run continuously, Reset to start over. Notice how the tree grows in smooth curves — never straight sideways jumps.
The tree here is small for clarity. In real kinodynamic RRT, the planner runs 10,000–100,000 iterations. Even so, it typically solves car parking problems in under a second on modern hardware because each iteration is just an ODE integration step — no matrix inversions, no solvers, no optimization.
Open-loop RRT outputs a sequence of controls (u1, u2, ..., uN). Execute them in order and, in simulation, you reach the goal. But real robots have noise, model errors, and disturbances. By the time you execute control u50, the accumulated error means you’re no longer on the planned trajectory — and the open-loop plan can’t recover.
Feedback control closes the loop: instead of pre-computing a fixed control sequence, the robot continuously measures its state x(t) and chooses the control u based on x(t). A feedback policy π: X → U maps current state to control. The trajectory is whatever x(t) the ODE &ẋ; = f(x, π(x)) produces under this policy.
For a feedback policy to be useful, the system must converge to the goal. A Lyapunov function V(x) proves convergence without solving the ODE explicitly. Think of V(x) as the “potential energy” of the system — if we can show V(x) decreases along every trajectory (V̇(x) < 0), then the system must eventually reach the minimum of V, which we place at the goal.
For a car, finding such a V is hard because the car can’t move sideways: a simple quadratic V = |x − xgoal|2 won’t work (the gradient often points sideways). Designing stabilizing feedback for nonholonomic systems is an active research area — Brockett’s theorem shows that smooth time-invariant feedback controllers cannot stabilize some nonholonomic systems.
One practical workaround: run kinodynamic RRT as a feedback policy. Measure the current state x(t), run RRT from x(t) to xgoal, execute the first few controls of the resulting plan, measure again, replan. This is receding-horizon planning or model-predictive control (MPC) applied to RRT. It provides implicit closed-loop behavior: each replan compensates for drift since the last plan.
A unicycle robot must reach the orange goal. Blue trajectory: open-loop (follows pre-computed controls, no corrections). Green trajectory: closed-loop (replan every 0.5s). Use the noise slider to add Gaussian noise to the dynamics and watch the difference.
Kinodynamic planning in the full state space is powerful but expensive. A simpler strategy: decouple the problem into two stages. First, find a geometric path in C-space using a fast planner (Ch 5 RRT, ignoring dynamics). Second, find a time-scaling — a function s(t) that converts the geometric path into a time-parameterized trajectory that satisfies dynamics and control bounds.
Formally, if the geometric path is q(σ) for σ ∈ [0, 1], the time-scaled trajectory is q(s(t)). The velocity along the path at time t is q̇(t) = (dq/dσ) · ṡ(t). The acceleration is q̈(t) = (dq/dσ) · s̈(t) + (d2q/dσ2) · ṡ(t)2. Both must stay within joint velocity and torque limits.
Given a fixed geometric path, the time-optimal time scaling finds s(t) that traverses the path as fast as possible subject to dynamics limits. Bobrow (1985) and Shin & McKay (1985) solved this with a phase-plane approach: plot (s, ṡ) and find the limiting velocity curve, then bang-bang control the acceleration to stay below it.
Horizontal axis: path progress s ∈ [0,1]. Vertical axis: path speed ṡ. The orange curve is the maximum speed at each path location (velocity limit from joint constraints). The green curve is the time-optimal trajectory — bang-bang between max acceleration and max deceleration. Drag the “accel limit” slider to see how tighter limits slow the traverse.
Decoupling works well for robot arms where any path in C-space is geometrically feasible and only dynamics limits constrain the time-scaling. It fails for:
| Failure mode | Example | Why it fails |
|---|---|---|
| Nonholonomic systems | Car, unicycle | Geometric path in C-space may require sideways motion that violates differential constraints — no valid time-scaling exists |
| Underactuated systems | Pendulum, brachiation robot | System has fewer controls than DOF; can’t independently command each direction along the path |
| Dynamic obstacles | Passing pedestrians | Time-scaling changes when along the path, which can create new collisions with moving obstacles |
Sampling-based planning finds a feasible trajectory, not necessarily a good one. The kinodynamic RRT tree grows randomly — the solution path is whatever sequence of arcs happened to reach the goal. It may be jerky, unnecessarily long, and far from time-optimal. Trajectory optimization takes such an initial solution and improves it by minimizing a cost functional.
The canonical trajectory optimization problem: find a trajectory x(t) and controls u(t) over t ∈ [0, T] that minimizes:
subject to &ẋ; = f(x, u), x(0) = xinit, x(T) ∈ Xgoal, x(t) ∈ Xfree, u(t) ∈ U. Here ℓ is the running cost (energy, time, jerk), φ is the terminal cost (distance to goal), and the constraints couple everything together.
CHOMP (Covariant Hamiltonian Optimization for Motion Planning, Ratliff et al. 2009) treats the entire trajectory as a single high-dimensional parameter vector and does gradient descent. Each trajectory segment is a vertex in a discrete graph; CHOMP simultaneously pushes all vertices away from obstacles and toward smoothness.
The CHOMP objective has two terms:
Uobs penalizes proximity to obstacles using a pre-computed signed distance field. Usmooth penalizes acceleration along the trajectory (∫ |q̈|2 dt). The gradient is taken in a Riemannian metric on trajectory space (the “covariant” in CHOMP), which ensures updates are in physically meaningful directions rather than arbitrary Euclidean gradient steps.
TrajOpt (Schulman et al. 2013) reformulates trajectory optimization as a sequence of convex programs. At each step, linearize the nonlinear constraints (collision, dynamics) around the current trajectory, solve the resulting quadratic program (QP) to get an update, repeat until convergence. TrajOpt handles hard collision avoidance constraints rather than soft penalties.
| Method | Approach | Handles hard constraints? | Local minima? |
|---|---|---|---|
| CHOMP | Gradient descent on cost functional | Only as soft penalties | Yes, can get stuck |
| TrajOpt | Sequential convex programming | Yes (collision as hard constraint) | Yes, but less often |
| iLQR | Iterative LQR (second-order) | Via barrier functions | Yes, needs warm start |
| Kinodynamic RRT* | Optimal sampling (asymptotic) | Yes (tree only grows in Xfree) | No (global optimum as n→∞) |
Textbook kinodynamic RRT works. Deployed kinodynamic planners that solve real problems require a handful of engineering decisions that the algorithm doesn’t specify. Here are the most critical ones.
The simplest integrator is Euler integration: x(t + Δt) ≈ x(t) + Δt · f(x(t), u). This accumulates error proportional to Δt. For a car at speed 1 m/s with Δt = 0.1 s, Euler error is O(0.01) per step — acceptable. For a quadrotor at high speed with fast attitude dynamics, Euler blows up unless Δt < 0.001 s, which makes each edge extremely short. Runge-Kutta 4 (RK4) gives fourth-order accuracy for only 4x more function evaluations:
def rk4_step(f, x, u, dt): k1 = f(x, u) k2 = f(x + dt/2 * k1, u) k3 = f(x + dt/2 * k2, u) k4 = f(x + dt * k3, u) return x + dt/6 * (k1 + 2*k2 + 2*k3 + k4)
RK4 allows 10x larger Δt than Euler for the same accuracy — far fewer, longer tree edges, which means faster coverage of state space.
Uniform random sampling of u ∈ U is unbiased but inefficient. In practice, try multiple controls from xnear (say, 5 or 10 random samples) and keep the one that moves closest to xrand in state space. This is a greedy multi-sample extend — it doesn’t change the algorithm’s probabilistic completeness but dramatically improves growth speed.
Pure uniform sampling grows a tree that fills all of X — which is thorough but slow. Standard RRT uses goal biasing: with probability pgoal (typically 5–10%), set xrand = xgoal directly. This biases tree growth toward the goal without abandoning exploration. Too high pgoal makes the planner greedy and poor at navigating around obstacles; too low makes it slow.
Grow two trees simultaneously: one from xinit and one from xgoal. On each iteration, grow one tree toward a random sample, then try to grow the other tree toward the newly added node. When the two trees get close enough to connect, the path is found. Bidirectional RRT typically solves problems 5–10x faster than single-tree RRT because both trees grow toward each other.
| Scenario | Best approach |
|---|---|
| Holonomic robot arm, static environment | RRT or PRM in C-space + time-scaling |
| Car parking / steering with limits | Kinodynamic RRT (bicycle model) |
| Drone with inertia, 12D state | Kinodynamic RRT* or SSST in state space |
| Need smooth, high-quality trajectory | Sample first → CHOMP/TrajOpt refinement |
| Online replanning (MPC) | Lightweight kinodynamic RRT in receding horizon |
| Known dynamics, small state space | Dynamic programming / HJB equation |
Chapter 14 sits at the intersection of classical robotics and modern control theory. It takes the geometric planning tools from Chapters 4–7 and merges them with the differential systems of Chapter 13. Understanding this chapter is the bridge to everything in robot motion planning that involves real physics.
| Earlier chapter | What Ch 14 inherits | How Ch 14 extends it |
|---|---|---|
| Ch 5 (RRT/PRM) | Tree growth loop, nearest-neighbor, sampling strategy | Replaces straight-line extend with ODE simulation; planning in state space not C-space |
| Ch 13 (Differential Models) | &ẋ; = f(x,u), unicycle model, phase space | Uses these models directly as the ODE to integrate in the extend step |
| Ch 8 (Feedback Planning) | Stability, Lyapunov functions, feedback policies | Identifies why smooth feedback fails for nonholonomic systems (Brockett) |
| Ch 10 (Sequential Decisions) | Cost-to-go, Bellman principle | Trajectory optimization (CHOMP, TrajOpt) minimizes similar cost functionals |
Motion planning for autonomous vehicles
Car manufacturers and research labs (Waymo, Apexx, MIT CSAIL) use kinodynamic planners for the low-level maneuver planning that respects steering limits, vehicle inertia, and jerk bounds. The bicycle model from Chapter 13 + kinodynamic RRT from Chapter 14 is literally in production code.
Drone and UAV planning
Quadrotors in cluttered environments (warehouses, disaster response) use kinodynamic planning in 12D state space. Research like RAPPIDS (Tordesillas et al.) uses sampling-based methods with receding-horizon replanning at 20 Hz, exactly the feedback-under-constraints architecture from Chapter 5 of this lesson.
Robot arm motion planning (MoveIt, OMPL)
ROS’s MoveIt planning framework uses OMPL, which implements kinodynamic RRT, RRT*, and KPIECE for robot arms. The decoupled approach (Chapter 6 of this lesson) is also available: plan path geometrically, then time-scale for joint velocity and torque limits.
Learned planners and neural RRT
Recent work replaces the random control sampling step with a learned policy that predicts which control to apply based on xnear and xrand. This is the connection between classical kinodynamic planning and deep reinforcement learning: the RL policy learns to be a better “steering function” than random sampling.
This chapter connects directly to:
“The difficulty is not in finding the right answer; it is in formulating the right question.”
— Steven M. LaValle, Planning Algorithms, 2006