How to describe where a robot IS — rotation matrices, homogeneous coordinates, DH parameters, and the chain rule of rigid bodies.
Suppose you're building a motion planner for a robot arm. The arm has three joints. You want it to pick up a cup. First question: where is the end-effector (the gripper) right now? This sounds trivial — the robot knows its own joint angles, after all — but the computation that translates "joint angles" into "gripper position in the room" is anything but trivial.
The root difficulty: geometry is not additive. If joint 1 rotates 30° and joint 2 rotates 45°, the gripper does not end up at 75°-equivalent position. Joint 2's rotation is applied in a coordinate frame that was already rotated by joint 1. The rotations compose multiplicatively, not additively — and the mathematics of how they compose is the subject of this entire chapter.
Before we can plan a path through configuration space (Chapter 4), we need to understand what "configuration space" is made of. For a rigid body moving in 2D, the configuration is a triple (x, y, θ) — two translations and one rotation. For a body in 3D, it's a 6-tuple. For a kinematic chain of n joints, it's an n-vector of joint angles. Chapter 3 is about the mathematics that connects these angle vectors to actual positions in Cartesian space.
A rigid body has a local coordinate frame attached to it. As it moves, that frame moves with it. The same point has different coordinates in local vs world frames. Drag to rotate and translate the body.
Before we can ask "does the robot collide with an obstacle?", we need a mathematical description of the robot's shape and the obstacle's shape. LaValle's approach uses semi-algebraic models built from convex primitives — and the foundation is the half-plane (2D) or half-space (3D).
A half-plane is the set of all points (x, y) satisfying a single linear inequality: a·x + b·y ≤ c. Geometrically, this is one side of a line. A convex polygon is the intersection of finitely many half-planes. A square is the intersection of four half-planes. A regular hexagon is six. Any convex polygon with n sides is defined by exactly n half-plane inequalities.
This representation is powerful because checking whether a point is inside a convex polygon is fast — just evaluate n inequalities. And intersecting two convex polygons is algorithmically clean. The price: most real shapes are not convex. A car body, a robot hand, a coffee mug — all are non-convex.
The standard trick: decompose a non-convex body into a finite union of convex primitives. A C-shaped robot might be represented as the union of two rectangles. The robot arm's link is a rectangle; its wrist is a small disk. When checking collision, you check each primitive separately — a point is in the non-convex body if it's in any one of its convex parts.
The 3D analog: a half-space is the set satisfying a·x + b·y + c·z ≤ d. A convex polyhedron is a finite intersection of half-spaces. A cube is 6 half-spaces. A tetrahedron is 4. The faces of the polyhedron correspond one-to-one with the half-space inequalities. Modern collision detection libraries (FCL, Bullet) represent robot links as convex polyhedra or triangulated meshes.
Add half-planes one at a time. The intersection (filled region) builds a convex polygon. The orange dot is a test point — watch the feasibility indicator.
Half-planes give us linear boundaries — straight lines and flat faces. But what about a circle? A cylinder? A robot link with rounded ends? For these we need algebraic primitives, not just linear ones.
An algebraic set is the zero set of a polynomial: { x : f(x) = 0 }. A circle is the algebraic set of x² + y² − r² = 0. An ellipse is ax² + by² − 1 = 0. These are the boundaries of more complex shapes.
A semi-algebraic set replaces the equality with an inequality: { x : f(x) ≤ 0 }. The disk x² + y² ≤ r² is a semi-algebraic set — the filled circle. A half-plane ax + by ≤ c is a special case (degree-1 polynomial). LaValle uses semi-algebraic sets as the general framework for describing robot bodies and obstacles, because they handle both linear boundaries (flat walls) and curved boundaries (cylinders, spheres) in a unified way.
This formula says: an obstacle O is a union (∪) of intersections (∩) of semi-algebraic primitives. The outer union handles non-convexity. The inner intersection handles each convex piece. Every robot-obstacle description in chapters 4–8 ultimately reduces to this form.
When the configuration space is built in Chapter 4, the obstacles in C-space (called C-obstacles) inherit this semi-algebraic structure from the workspace obstacles and the robot body. The planner must navigate a space whose "walls" are defined by polynomial inequalities. This is why exact planners (Chapter 6) use algebraic geometry, while sampling-based planners (Chapter 5) sidestep the algebra entirely by testing random configurations.
Toggle between linear (half-plane: ax+by≤c), quadratic (disk: x²+y²≤r²), and combined (intersection) primitives. The blue region is the semi-algebraic set.
A rigid body in 2D has three degrees of freedom: two translation (xt, yt) and one rotation (θ). Given a point p = (a, b) expressed in the body's local frame, what are its coordinates in the world frame?
Rotating a point (a, b) by angle θ around the origin gives:
| x' | = | a cosθ − b sinθ |
| y' | = | a sinθ + b cosθ |
In matrix form, with R(θ) the rotation matrix:
This is a 2×2 orthogonal matrix (RTR = I, det R = 1). The columns of R(θ) are the body's x-axis and y-axis expressed in world coordinates. If θ = 0, R = I (identity: body and world aligned). If θ = 90°, R rotates everything 90° counterclockwise.
The set of all 2×2 rotation matrices — all matrices satisfying RTR = I and det R = +1 — is called SO(2) (Special Orthogonal group in 2D). It's topologically a circle: each θ ∈ [0, 2π) corresponds to exactly one rotation. The + sign on the determinant excludes reflections.
Rotation alone isn't enough — the body also translates by (xt, yt). Combining rotation and translation:
This is affine, not linear — you can't represent it as a single matrix multiplication on (x, y). The fix: homogeneous coordinates. Augment every 2D point with a 1 in the third slot: [x, y, 1]T. Then the combined rotation + translation becomes a single 3×3 matrix multiplication:
This 3×3 matrix is the homogeneous transformation matrix for 2D rigid motion. The top-left 2×2 block is R(θ). The top-right column is [xt, yt]T. The bottom row is always [0, 0, 1]. Its power: chaining two transformations (first rotate, then translate, then rotate again) is just matrix multiplication.
A rigid triangle body (orange) starts at origin. Sliders control rotation θ and translation. Watch the coordinate axes rotate with the body — teal arrow = body x-axis in world frame.
Moving from 2D to 3D rotations is a significant step — not just because the matrices grow from 2×2 to 3×3, but because 3D rotation has a fundamentally different topology and several nasty surprises.
In 3D, we can rotate around any of the three coordinate axes. Rotation by angle α around the z-axis:
Similarly, Rx(α) rotates around the x-axis, Ry(β) around the y-axis. Each is a 3×3 orthogonal matrix with det = +1. The set of all 3×3 rotation matrices is SO(3) (Special Orthogonal group in 3D).
Any 3D rotation can be decomposed into three successive rotations around coordinate axes. The most common convention (aerospace): yaw (rotation around z-axis, like a car turning), pitch (rotation around y-axis, like nose-up/down), roll (rotation around x-axis, like a barrel roll). The combined rotation is:
These three angles together are called Euler angles (more precisely, Tait-Bryan angles). They are widely used in robotics and aerospace because they are human-interpretable. The catch: they suffer from a phenomenon called gimbal lock.
Gimbal lock occurs when one rotation aligns two rotation axes, causing the loss of one degree of freedom. For the yaw-pitch-roll convention, it happens when pitch = ±90°: at that angle, yaw and roll become equivalent — you can no longer independently control all three axes. Two of your three "knobs" do the same thing.
This is not a software bug — it's a fundamental topological fact about representing SO(3) with three angles. There is no three-parameter representation of SO(3) that avoids singularities everywhere. The solutions: use four-parameter quaternions (no singularities, compact, great for interpolation) or use the full 3×3 rotation matrix directly (9 parameters, but easy to compose).
Rotate a 3D wireframe cube using yaw, pitch, and roll. Set pitch to ±90° to see gimbal lock — yaw and roll become indistinguishable.
The 2D homogeneous trick (3×3 matrix = rotation + translation) extends naturally to 3D: a 4×4 matrix encodes a full rigid-body transformation — 3D rotation and 3D translation in one object.
A 3D point (x, y, z) is represented in homogeneous coordinates as [x, y, z, 1]T. A rigid transformation (rotation R, translation t) acts as:
The upper-left 3×3 block is the rotation matrix R ∈ SO(3). The upper-right 3×1 column is the translation vector t. The bottom row [0, 0, 0, 1] is always fixed. The resulting 4×4 matrix is a member of the SE(3) group (Special Euclidean group in 3D) — the group of all rigid-body transformations.
The magic: composing two rigid transformations T₁ (first) then T₂ (second) is just matrix multiplication T = T₂ · T₁. If you rotate by θ, then translate by t, then rotate by φ, the combined transformation is R(φ) · T(t) · R(θ) — read right to left. This works because matrix multiplication is associative.
python import numpy as np def rot_z(theta): """4x4 homogeneous matrix for rotation around z-axis.""" c, s = np.cos(theta), np.sin(theta) return np.array([ [c, -s, 0, 0], [s, c, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1] ]) def translate(tx, ty, tz): """4x4 homogeneous matrix for pure translation.""" return np.array([ [1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1] ]) # Compose: first rotate 45°, then translate (2, 1, 0) T = translate(2, 1, 0) @ rot_z(np.pi/4) # Transform a point at (1, 0, 0) in the body frame p_body = np.array([1, 0, 0, 1]) # homogeneous p_world = T @ p_body # p_world[:3] = world-frame position of the point
The inverse of a homogeneous transformation T is NOT just the matrix inverse (expensive). Because R is orthogonal, R-1 = RT. Therefore:
This is the "undo" transformation: if T maps body-frame coordinates to world-frame, then T-1 maps world-frame back to body-frame. Computing it is just a transpose + multiply — far cheaper than general matrix inversion.
Adjust the 3D transformation parameters. Watch how the 4×4 matrix updates in real time. The green axes show where the body-frame origin and axes land in the world frame.
A robot arm is a chain of rigid links, connected by joints. Each joint adds one degree of freedom (for a revolute joint, one rotation angle θ). To compute the end-effector pose, you need to chain together the transformations of all n links. How do you parameterize this chain systematically?
The answer used by most robotics textbooks: the Denavit-Hartenberg (DH) convention, published in 1955. DH provides a systematic way to assign coordinate frames to robot links and express the transformation between consecutive frames using exactly four parameters per joint.
For each link i, four parameters describe the geometry:
| Parameter | Symbol | Meaning | Type |
|---|---|---|---|
| Link length | ai | Distance between z-axes of frames i-1 and i along xi-1 | Fixed (geometry) |
| Link twist | αi | Angle between zi-1 and zi around xi-1 | Fixed (geometry) |
| Link offset | di | Distance between frames along zi | Variable for prismatic joints |
| Joint angle | θi | Rotation from xi-1 to xi around zi | Variable for revolute joints |
For a standard revolute robot arm (like most industrial arms), ai and αi are fixed constants (determined by the arm's physical design), di are fixed offsets, and θi are the joint angles — the variables you control.
Each link contributes one 4×4 transformation matrix. The DH matrix for link i is the composition of four elementary transformations:
In explicit form:
Where c and s abbreviate cos and sin. This single matrix captures all four DH parameters in one expression.
python import numpy as np def dh_matrix(theta, d, a, alpha): """Standard DH transformation matrix for one link.""" ct, st = np.cos(theta), np.sin(theta) ca, sa = np.cos(alpha), np.sin(alpha) return np.array([ [ct, -st*ca, st*sa, a*ct], [st, ct*ca, -ct*sa, a*st], [0, sa, ca, d ], [0, 0, 0, 1 ] ]) # 2-link planar arm: both links length 1, no twist or offset # DH table: theta=joint angle, d=0, a=1, alpha=0 for each link def fk_2link(theta1, theta2): T1 = dh_matrix(theta1, 0, 1.0, 0) # frame 0→1 T2 = dh_matrix(theta2, 0, 1.0, 0) # frame 1→2 T = T1 @ T2 # frame 0→2 return T[:3, 3] # end-effector xyz position
We now have all the ingredients: link geometry (DH parameters), transformation matrices, and the chain rule T = T₁T₂...Tₙ. Forward kinematics is the computation that takes a vector of joint angles and returns the world-frame pose of the end-effector. Let's build it and watch it move.
For an n-joint chain, the transformation from world frame to end-effector frame is:
Each Ti(θi) is the DH matrix for link i, which depends only on θi (for a revolute joint). The position of the end-effector in world coordinates is the last column of T0→n (minus the last row). The orientation of the end-effector is the upper-left 3×3 submatrix.
Key insight: the elbow position is T1(θ1)[last column]. The wrist position is T1T2(θ₁,θ₂)[last column]. Each intermediate frame position is the last column of the partial product up to that link. This makes computing the full arm pose (all joint positions) efficient: just accumulate the product left to right.
Adjust the three joint angles to move the arm. The end-effector ● traces its path. Link lengths are fixed (L₁=L₂=L₃=1). Notice how changing θ₁ sweeps the entire arm; changing θ₃ only rotates the last link.
Let's trace through the math for a 2-link planar arm (both links of length 1, all in the xy-plane, so αi = 0, di = 0):
python import numpy as np def forward_kinematics_planar(thetas, link_lengths): """ Compute positions of all joints for a planar n-link arm. Returns list of (x,y) positions: [base, joint1, joint2, ..., end-effector] """ positions = [(0.0, 0.0)] # base always at origin T = np.eye(4) # start with identity for theta, L in zip(thetas, link_lengths): # DH matrix for planar link: alpha=0, d=0 c, s = np.cos(theta), np.sin(theta) Ti = np.array([ [c, -s, 0, L*c], [s, c, 0, L*s], [0, 0, 1, 0], [0, 0, 0, 1] ]) T = T @ Ti # accumulate: frame 0→current x, y = T[0,3], T[1,3] # joint position = last column positions.append((x, y)) return positions # Example: 3-link arm, links of length [1, 1, 0.5] thetas = [np.radians(30), np.radians(-60), np.radians(20)] positions = forward_kinematics_planar(thetas, [1, 1, 0.5]) print("Joint positions:", positions) # [(0, 0), (0.866, 0.5), (1.116, 0.067), (1.295, 0.238)] # Each entry is the world-frame position of that joint.
Notice the data flow: joint angles (θ₁, θ₂, θ₃) go in, Cartesian positions (x, y) come out. The intermediate state at each step is a 4×4 matrix — the accumulated transformation from frame 0 to the current frame. The end-effector position is just T[0:3, 3] of the final accumulated matrix.
A robot arm is a chain: each link has exactly one parent and at most one child. But a human hand is not a chain — it has five fingers, each a chain of its own, all branching from the palm. A quadruped robot has four legs. A humanoid has two arms, two legs, and a head. These are kinematic trees.
In a kinematic chain: T0→n = T₁ · T₂ · ... · Tₙ. Each frame has exactly one parent. Forward kinematics is a single matrix product along the chain.
In a kinematic tree, each node has one parent but can have multiple children. To compute the pose of any end-effector, you trace the path from the root (base) to that end-effector — which is still a chain, just a sub-chain of the tree. Forward kinematics for a tree is just a depth-first traversal: accumulate T matrices down each branch.
Some mechanisms form loops — not trees. A parallel robot (like a Stewart platform — the flight simulator base) has multiple chains connecting base to end-effector simultaneously. Closed kinematic chains are harder: the loops introduce kinematic constraints — algebraic equations that the joint angles must satisfy. Treating a closed chain as a tree and ignoring the loop constraints gives wrong kinematics.
LaValle's treatment focuses on open chains (no loops). Closed chains require specialized methods (constraint solving, reduced configuration spaces) that are beyond Chapter 3's scope.
A simple kinematic tree: one base, two branches (left arm, right arm). Each branch is an independent chain. Sliders control one joint from each arm.
Chapter 3 is the geometric foundation for everything in Part II. Let's trace where each concept goes and what it unlocks.
Forward kinematics gives x = f(θ) — end-effector position as a function of joint angles. The Jacobian J = ∂f/∂θ is the derivative of this function — it maps joint angular velocities θ̇ to end-effector velocities ẋ = J(θ)θ̇. The Jacobian is central to velocity control, singularity analysis, and trajectory optimization (Chapter 14). It's derived directly from the DH chain of Chapter 3.
The algorithms of Chapter 3 are implemented in virtually every robotics library:
| Library | Language | Key Chapter 3 features |
|---|---|---|
| ROS tf/tf2 | C++/Python | Transform trees, coordinate frame management |
| PyBullet / MuJoCo | Python | Rigid-body simulation, FK/IK built-in |
| robotics-toolbox (Python) | Python | DH parameter robots, FK, Jacobian, visualization |
| Pinocchio | C++/Python | Fast rigid-body FK/dynamics, tree topologies |
| Drake | C++/Python | Full-stack: kinematics, dynamics, optimization |
| Representation | Parameters | Singularities | Composition | Best for |
|---|---|---|---|---|
| Euler angles (ZYX) | 3 | Gimbal lock at ±90° pitch | Non-trivial | Human-readable output |
| Rotation matrix | 9 (3 redundant) | None | Matrix multiply | FK chains, all computation |
| Quaternions | 4 (1 constraint) | None | Quaternion multiply | Interpolation, animation, IMUs |
| Axis-angle | 3 (axis) + 1 (angle) | At θ=0 | Rodrigues' formula | Small rotations, lie algebra |
| Topic | Lesson |
|---|---|
| Configuration space and C-obstacles | Ch 4: Configuration Space → |
| Sampling-based planning using FK | Ch 5: Sampling-Based Planning → |
| Discrete planning foundations | ← Ch 2: Discrete Planning |
"The configuration space is probably the most important concept in all of motion planning. Everything else depends on understanding what this space is and how to work with it."
— Steven LaValle, Planning Algorithms (2006), Ch 4 introduction