Kochenderfer et al., Chapter 8

Linear Reachability

Proving mathematically that a linear system can never enter an unsafe region — no matter what disturbances occur.

Prerequisites: Linear algebra (matrix multiply, basic set theory). That's it.
10
Chapters
5+
Simulations
10
Quizzes

Chapter 0: From Sampling to Guarantees

Chapter 7 estimated failure probability by running millions of simulations. But no matter how many simulations you run, there's always a chance you missed a rare failure case lurking in an unexplored corner of the disturbance space. Sampling gives statistical confidence. Can we do better?

Reachability analysis answers this with a definitive yes. Instead of asking "how often does the system fail?", it asks "what is the set of ALL possible states the system can ever reach?" If that set doesn't intersect the unsafe region, the system is provably safe. Not statistically likely safe. Mathematically guaranteed safe.

The fundamental shift: Sampling explores individual trajectories. Reachability propagates entire sets of states forward through time. One reachability computation covers every possible trajectory simultaneously — including the adversarial worst case that no Monte Carlo run would ever find.

The trade-off is scope. Reachability is computationally expensive, and exact reachability is only tractable for restricted system classes. This chapter covers linear systems — where the dynamics are matrix multiplications and the disturbances are bounded sets. Chapters 9 and 10 extend to nonlinear and discrete systems.

Sampling (Ch 4-7)
Individual trajectories → statistical estimate of P(fail)
↓ want something stronger
Reachability (Ch 8-10)
All possible states → mathematical proof of safety
PropertySamplingReachability
Result typeProbability estimateYes/no proof
CoverageRandom subset of trajectoriesALL trajectories simultaneously
Missed failuresAlways possible (rare events)Impossible (if exact)
Computational costScales with desired confidenceScales with system dimension
System restrictionsAny simulable systemLinear (this chapter), restricted nonlinear

The plan for this chapter: we'll define the reachable set, show how linear dynamics make set propagation tractable, introduce the two fundamental set operations (linear transformation and Minkowski sum), explore the polytope and zonotope set representations, and see how overapproximation lets us trade tightness for speed.

Check: What does reachability analysis provide that Monte Carlo sampling cannot?

Chapter 1: Forward Reachability

A system starts in some initial state x0 drawn from an initial set X0. At each time step, the state evolves according to the dynamics, subject to disturbances from a disturbance set W. The reachable set at time t is the set of all states the system could possibly be in, over all possible initial conditions and all possible disturbance sequences:

Rt = {xt : x0 ∈ X0, wk ∈ W for k = 0, …, t−1}

This is a set, not a distribution. It contains every state that could result from any combination of allowed initial conditions and disturbances. If Rt ∩ Xunsafe = ∅ for all t, the system is safe.

Think of it as a flood: Drop a puddle of water (the initial set) on a landscape (the dynamics). At each time step, the water flows and spreads (disturbances push it in all directions). The reachable set is the total wetted area at time t. If the water never reaches the forbidden zone, the system is safe.

The key recursion for computing the reachable set one step at a time is:

Rt+1 = {f(x, w) : x ∈ Rt, w ∈ W}

Start with R0 = X0. At each step, apply the dynamics f to every state in Rt and every disturbance in W. The result is Rt+1. This is conceptually simple but computationally brutal for general f — we'd need to track an infinite set of points. The power of linear dynamics is that this infinite operation reduces to just two set operations.

Forward vs. backward: We could also compute reachability backward: start from the unsafe set and ask "which initial conditions can reach it?" Forward reachability is more intuitive and more common in practice for time-bounded safety verification. Both give equivalent safety verdicts.
Check: What does the reachable set Rt represent?

Chapter 2: Linear Systems

For a general nonlinear system xt+1 = f(xt, wt), computing the reachable set is intractable — it requires tracking the image of a set under an arbitrary function. But for linear time-invariant (LTI) systems, the dynamics have the form:

xt+1 = A xt + B wt

where A is the state transition matrix and B maps disturbances into state space. Both A and B are constant matrices. The initial set X0 and disturbance set W are compact convex sets (think: ellipsoids, boxes, or polytopes).

Substituting into the reachability recursion:

Rt+1 = {Ax + Bw : x ∈ Rt, w ∈ W}

This decomposes into two operations:

Linear map
A · Rt = {Ax : x ∈ Rt}
+
Minkowski sum
B · W = {Bw : w ∈ W}
=
Rt+1
A · Rt ⊕ B · W

The symbol ⊕ denotes the Minkowski sum: P ⊕ Q = {p + q : p ∈ P, q ∈ Q}. We will define both operations precisely in the next chapter. The crucial point is that these two operations — linear transformation and Minkowski sum — are the only tools we need. Everything else in this chapter is about computing them efficiently.

Why linearity matters: For nonlinear f, the image of a convex set under f may not be convex, making exact representation impossible with simple data structures. For linear A, the image of a convex set is always convex. And the Minkowski sum of two convex sets is convex. So convexity is preserved at every step — the reachable set stays convex, and we can represent it compactly.

Let's trace one step in 2D. Suppose R0 is a unit square centered at the origin, A = [[0.9, 0.1], [−0.1, 0.9]] (a slight rotation + contraction), and BW is a small box [−0.1, 0.1]2.

StepOperationResult shape
1A · R0Slightly rotated, slightly smaller square
2(A · R0) ⊕ BWRotated square, "fattened" by the disturbance box

Each step rotates and shrinks (because A has eigenvalues with magnitude 0.9), then fattens by the disturbance. Over many steps, the reachable set converges to a limit set — the disturbance injection balances the contraction. If that limit set avoids the unsafe region, the system is safe forever.

Check: For the linear system xt+1 = Axt + Bwt, what two set operations compute Rt+1 from Rt?

Chapter 3: Set Operations

Let's define the two building blocks precisely and build intuition for each.

Linear transformation. Given a set P and a matrix A, the image is:

A · P = {Ap : p ∈ P}

Geometrically, A stretches, rotates, and/or shears every point in P. If P is a circle, AP is an ellipse. If P is a square, AP is a parallelogram. The key property: if P is convex, AP is convex.

Minkowski sum. Given two sets P and Q:

P ⊕ Q = {p + q : p ∈ P, q ∈ Q}

Geometrically: slide Q around the boundary of P (or equivalently, slide P around Q). The swept area is P ⊕ Q. Equivalently, "fatten" P by the shape of Q. If P is a square and Q is a disk, P ⊕ Q is a rounded rectangle.

Worked example in 1D: Let P = [1, 3] and Q = [−0.5, 0.5]. Then P ⊕ Q = {p + q : 1 ≤ p ≤ 3, −0.5 ≤ q ≤ 0.5} = [0.5, 3.5]. The Minkowski sum of two intervals is the interval of their endpoint sums. In 2D it becomes more interesting.
Minkowski Sum in 2D

The blue set P is fixed. Drag the orange slider to change the size of Q (a disk). The green outline shows P ⊕ Q — P "fattened" by Q. Notice how corners get rounded by the disk.

Radius of Q0.50

The computational cost of these operations depends heavily on how we represent the sets. A set described by 4 vertices is easy to transform; a set described by 1000 vertices is harder. This brings us to the critical question: what data structure should we use for sets? The next three chapters explore three answers — polytopes, zonotopes, and support functions — each with different trade-offs.

OperationWhat it doesPreserves convexity?
A · PStretch / rotate / shearYes
P ⊕ Q"Fatten" P by shape of QYes (if both convex)
P ∩ QIntersectionYes (if both convex)
conv(P ∪ Q)Convex hull of unionYes (by definition)
Check: What is the Minkowski sum of the interval [2, 5] and the interval [−1, 1]?

Chapter 4: The Dependency Effect

Before we choose a set representation, we need to understand a subtle trap. Set operations treat their operands as independent. Consider computing A · (P ⊕ Q). If we first compute S = P ⊕ Q, then compute A · S, the result is correct. But what about A · P ⊕ A · Q? Is that the same?

For linear maps, yes: A(P ⊕ Q) = AP ⊕ AQ. This is the distributive property. But trouble arises when the same variable appears multiple times in an expression. Consider:

{x + x : x ∈ [−1, 1]} = [−2, 2]

But if we treat the two x's independently:

[−1, 1] ⊕ [−1, 1] = [−2, 2]

In 1D it happens to agree. But in higher dimensions, it often doesn't. Consider x ∈ [−1,1]2 and the expression Ax + Bx with A and B being different matrices. The true result is {(A+B)x : x ∈ [−1,1]2}, but Minkowski-summing AP ⊕ BP treats the x in each term independently, potentially producing a larger set. This is the dependency effect (also called the wrapping effect).

The dependency effect: When the same uncertain variable appears in multiple places, set operations that treat each occurrence independently produce an overapproximation — a set that contains the true result but is larger. This bloat accumulates over time steps. After many steps, the reachable set can be absurdly inflated, making safety verification impossible even when the system is safe.

The practical implication: always simplify expressions before applying set operations. Compute (A+B)x as a single linear map on x, rather than computing Ax and Bx separately and summing. This algebraic preprocessing is critical for controlling wrapping.

Without simplification:

R = Ax ⊕ Bx

Treats x as two independent variables → bloat

With simplification:

R = (A + B)x

Single linear map on x → exact

In the reachability recursion Rt+1 = A · Rt ⊕ BW, the dependency effect is manageable because xt and wt really are independent (the current state and the current disturbance are separate variables). The danger arises when representing Rt itself involves implicit variables that get duplicated across operations.

Check: What is the dependency effect in set operations?

Chapter 5: Polytopes

A polytope is the most natural way to represent a convex set: it's a bounded region defined by flat faces. In 2D, polytopes are convex polygons. In 3D, convex polyhedra. There are two equivalent representations.

H-polytope (half-space representation): the intersection of finitely many half-spaces.

P = {x : Hx ≤ h}

Each row of H defines a face normal, and the corresponding entry of h defines the face offset. A cube in 3D needs 6 half-spaces. An n-gon in 2D needs n half-spaces.

V-polytope (vertex representation): the convex hull of finitely many points.

P = conv(v1, v2, …, vk)

The set of all convex combinations of the vertices. A triangle has 3 vertices. A cube has 8.

Dual representations: Every H-polytope can be converted to a V-polytope and vice versa. But the conversion is expensive. In the worst case, a polytope with m half-spaces can have O(m⌊d/2⌋) vertices in d dimensions. For d = 10 and m = 100, that's potentially 1005 = 1010 vertices. This exponential blowup makes polytopes impractical for high-dimensional reachability.

Each representation has a "cheap" operation and an "expensive" one:

OperationH-polytope costV-polytope cost
Intersection P ∩ QCheap (stack constraints)Expensive (vertex enumeration)
Linear map APExpensive (need vertex conversion)Cheap (transform each vertex)
Minkowski sum P ⊕ QExpensive (complex algorithm)Expensive (vertex count explodes)

The critical problem for reachability is the Minkowski sum. In V-representation, if P has m vertices and Q has n vertices, P ⊕ Q can have up to O(mn) vertices in 2D and exponentially more in higher dimensions. Since reachability requires a Minkowski sum at every time step, the vertex count grows exponentially with the number of steps.

The vertex explosion: Start with a 4-vertex square. After one Minkowski sum with a 4-vertex disturbance box, the result can have up to 8 vertices. After 10 steps: potentially 4 × 210 = 4096 vertices. After 100 steps: over 1030 vertices. This exponential blowup makes V-polytopes unusable for long time horizons — and motivates the zonotope representation.
Check: Why do V-polytopes become impractical for reachability over many time steps?

Chapter 6: Zonotopes

A zonotope is a special type of polytope defined by a center and a set of generators. Given a center c ∈ Rd and generator matrix G = [g1, …, gk] (each gi ∈ Rd), the zonotope is:

Z = {c + Gξ : ξ ∈ [−1, 1]k} = c ⊕ g1[−1,1] ⊕ … ⊕ gk[−1,1]

Each generator "stretches" a unit interval along its direction. The zonotope is the Minkowski sum of all these line segments, translated by c. In 2D, a zonotope with 2 generators is a parallelogram. With 3 generators, it's a hexagon. With k generators, it's a convex polygon with at most 2k sides.

The key insight: Zonotopes make the two reachability operations trivial:

Linear map: A · Z(c, G) = Z(Ac, AG). Just multiply the center and every generator by A.

Minkowski sum: Z(c1, G1) ⊕ Z(c2, G2) = Z(c1 + c2, [G1 | G2]). Add centers, concatenate generators.

The generator count grows linearly, not exponentially!

Let's trace the reachability step. Suppose Rt = Z(ct, Gt) with kt generators, and BW = Z(0, Gw) with kw generators.

Rt+1 = A · Rt ⊕ BW = Z(Act, AGt) ⊕ Z(0, Gw) = Z(Act, [AGt | Gw])

After one step: kt+1 = kt + kw generators. After T steps: k0 + T · kw generators. Linear growth! If the initial set has 2 generators and the disturbance has 2 generators, after 100 steps we have 202 generators — not 1030 vertices.

Zonotope vs. Polytope: Reachable Set Growth

Watch the reachable set grow step by step. Left: polytope (vertex count explodes). Right: zonotope (generator count grows linearly). The bar chart below tracks complexity. Click Step to advance one time step.

t = 0
Speed600ms
The cost of generators: While generator count grows linearly, each generator is a d-dimensional vector. For large d, storing 1000 generators costs 1000d floats, and computing AG costs O(d2) per generator. In practice, generator reduction (dropping or merging small generators) keeps the count bounded at a cost of slight overapproximation.
Check: How does the Minkowski sum of two zonotopes Z(c1, G1) and Z(c2, G2) work?

Chapter 7: Overapproximation

Exact reachability is ideal but often too expensive. Overapproximation computes a set P̃ that contains the true reachable set: Rt ⊆ P̃t. If the overapproximation doesn't intersect the unsafe region, neither does the true reachable set. Safety is preserved.

Rt ⊆ P̃t     and     P̃t ∩ Xunsafe = ∅     ⇒     Rt ∩ Xunsafe = ∅
Why overapproximate? Three reasons. First, the exact reachable set may not be representable in our chosen data structure (e.g., a Minkowski sum of many polytopes isn't a single polytope). Second, exact computation may be too slow. Third, controlled overapproximation lets us use simpler, faster set representations. The trade-off: tighter overapproximations are more expensive but less likely to produce false alarms.

Common overapproximation strategies:

StrategyIdeaTightness
Bounding boxAxis-aligned box containing the setLoose (ignores orientation)
Oriented boxBox aligned to principal axesBetter than axis-aligned
Zonotope reductionMerge small generators to limit countGood (controlled bloat)
Support functionOverapprox by a polytope with chosen face directionsExcellent (refineable)

The critical insight: overapproximation introduces one-sided error. If P̃ ∩ Xunsafe ≠ ∅, we cannot conclude the system is unsafe — the intersection might be entirely within the overapproximation gap. We can only conclude safety (when there's no intersection) or "inconclusive" (when there is). To resolve inconclusive cases, we refine the overapproximation or switch to a tighter method.

P̃ ∩ Xunsafe = ∅
System is SAFE (guaranteed)
P̃ ∩ Xunsafe ≠ ∅
INCONCLUSIVE — refine or use tighter method
Generator reduction for zonotopes: After T steps, a zonotope has k0 + T · kw generators. To keep complexity bounded, we reduce: sort generators by magnitude, keep the largest kmax, and overapproximate the remaining small generators with a bounding box (which adds d generators in d dimensions). This keeps each step O(d · kmax) and introduces minimal bloat if the discarded generators are small.
Check: If the overapproximation P̃ intersects the unsafe set, what can we conclude?

Chapter 8: Support Functions

All the set representations so far store explicit geometric data (vertices, generators, half-spaces). The support function takes a fundamentally different approach: it represents a convex set by answering the question "how far does the set extend in direction d?" for any direction d.

ρP(d) = max { dTx : x ∈ P }

The support function ρP(d) gives the maximum "projection" of P along direction d. It completely characterizes any convex compact set — knowing ρP(d) for all directions d uniquely determines P.

Think of it as a shadow: Shine a flashlight in direction d at the set P. The support function tells you how far the shadow extends. Repeating for all directions gives you the full silhouette of P from every angle — enough to reconstruct P exactly.

Support functions compose beautifully with our two operations:

ρAP(d) = ρP(ATd)      (linear map)
ρP ⊕ Q(d) = ρP(d) + ρQ(d)      (Minkowski sum)

The Minkowski sum becomes simple addition! No vertex explosion, no generator concatenation. The cost is that the support function is defined for all directions d, not just a finite set. In practice, we evaluate it at a finite set of directions d1, …, dn and construct an outer-approximating polytope:

P̃ = {x : djTx ≤ ρP(dj) for j = 1, …, n}

More directions = tighter approximation. The directions can be chosen adaptively — place more directions where the boundary has high curvature.

Iterative refinement: Start with a coarse set of directions (e.g., the axis-aligned ones). Compute the overapproximating polytope. If it intersects the unsafe set, add more directions in the intersection region and recompute. This adaptively tightens the approximation only where needed — a powerful strategy for safety verification.
RepresentationLinear map costMinkowski sum costStorage
V-polytope (m verts)O(md)O(m2) vertices — exponentialO(md)
Zonotope (k gens)O(kd2)O(1) — concatenateO(kd)
Support fn (n dirs)O(n) evaluationsO(n) additionsO(n) scalars + eval oracle
Check: How does the support function of a Minkowski sum P ⊕ Q relate to the support functions of P and Q?

Chapter 9: LP Safety Checking

We now have tools to compute (or overapproximate) the reachable set. The final question is: does it intersect the unsafe region? This is a safety checking problem, and for polytopic sets, it reduces to linear programming (LP).

Suppose the unsafe set is a polytope Xunsafe = {x : Cx ≤ d}. And our (over-approximated) reachable set P̃ is also a polytope or can be queried via a support function. Safety holds if and only if P̃ ∩ Xunsafe = ∅.

To check intersection, we can test each face of Xunsafe. Face j is defined by cjTx ≤ dj. The reachable set violates this constraint if and only if:

ρ(cj) > dj

If this holds for at least one face j — meaning the reachable set extends beyond at least one face of the unsafe set in the outward direction — then P̃ might not actually intersect Xunsafe. Wait, that's the wrong direction. Let's be precise:

Intersection test: P̃ ∩ Xunsafe ≠ ∅ if and only if there exists x satisfying all constraints of both sets simultaneously. This is a feasibility LP:

Find x such that: x ∈ P̃ and Cx ≤ d.

If the LP is feasible, the sets intersect (potential safety violation). If infeasible, they don't (safe).

Alternatively, using support functions: P̃ and Xunsafe are disjoint if and only if there exists a separating hyperplane. By the separating hyperplane theorem, this is equivalent to finding a direction d such that:

ρ(d) < −ρXunsafe(−d)

In practice, the LP approach is most common. Modern LP solvers handle millions of constraints in seconds, making safety checking fast even for complex reachable sets.

pseudocode
# Safety verification via reachability
Given: A, B, X0, W, Xunsafe, time horizon T

R ← X0
for t = 1 to T:
  R ← overapprox(A · R ⊕ B · W)   # zonotope/support fn step
  if LP_feasible(R ∩ Xunsafe):
    return "INCONCLUSIVE at step t"
return "SAFE for [0, T]"

The full pipeline: choose a set representation (zonotope for speed, support function for tightness), propagate the reachable set through the linear dynamics, overapproximate at each step to control complexity, and check intersection with the unsafe set via LP. If the LP is infeasible at every step, the system is provably safe. If feasible at any step, refine the overapproximation or increase the number of support directions.

StepToolCost
Set propagationZonotope / support functionO(kd2) per step
OverapproximationGenerator reduction / direction refinementO(kd) per step
Safety checkLP feasibilityPolynomial in constraint count
OverallFull pipelinePolynomial in T, d, k
Looking ahead: This chapter covered linear systems — the simplest case. Chapter 9 tackles nonlinear systems (interval arithmetic, Taylor models, neural network verification). Chapter 10 handles discrete and hybrid systems (SAT, probabilistic reachability). The set-based reasoning framework remains the same; only the propagation and overapproximation methods change.
Check: How is the intersection of the reachable set with the unsafe set checked in practice?