Ch 8: Linear Reachability — Algorithms for Validation

Chapter 0: From Sampling to Guarantees

Chapter 7 estimated failure probability by running millions of simulations. But no matter how many simulations you run, there's always a chance you missed a rare failure case lurking in an unexplored corner of the disturbance space. Sampling gives statistical confidence. Can we do better?

Reachability analysis answers this with a definitive yes. Instead of asking "how often does the system fail?", it asks "what is the set of ALL possible states the system can ever reach?" If that set doesn't intersect the unsafe region, the system is provably safe. Not statistically likely safe. Mathematically guaranteed safe.

The fundamental shift: Sampling explores individual trajectories. Reachability propagates entire sets of states forward through time. One reachability computation covers every possible trajectory simultaneously — including the adversarial worst case that no Monte Carlo run would ever find.

The trade-off is scope. Reachability is computationally expensive, and exact reachability is only tractable for restricted system classes. This chapter covers linear systems — where the dynamics are matrix multiplications and the disturbances are bounded sets. Chapters 9 and 10 extend to nonlinear and discrete systems.

Sampling (Ch 4-7)

Individual trajectories → statistical estimate of P(fail)

↓ want something stronger

Reachability (Ch 8-10)

All possible states → mathematical proof of safety

Property	Sampling	Reachability
Result type	Probability estimate	Yes/no proof
Coverage	Random subset of trajectories	ALL trajectories simultaneously
Missed failures	Always possible (rare events)	Impossible (if exact)
Computational cost	Scales with desired confidence	Scales with system dimension
System restrictions	Any simulable system	Linear (this chapter), restricted nonlinear

The plan for this chapter: we'll define the reachable set, show how linear dynamics make set propagation tractable, introduce the two fundamental set operations (linear transformation and Minkowski sum), explore the polytope and zonotope set representations, and see how overapproximation lets us trade tightness for speed.

Check: What does reachability analysis provide that Monte Carlo sampling cannot?

A faster estimate of failure probability A mathematical guarantee that the system never enters an unsafe region A more accurate model of system dynamics

Chapter 1: Forward Reachability

A system starts in some initial state x₀ drawn from an initial set X₀. At each time step, the state evolves according to the dynamics, subject to disturbances from a disturbance set W. The reachable set at time t is the set of all states the system could possibly be in, over all possible initial conditions and all possible disturbance sequences:

R_t = {x_t : x₀ ∈ X₀, w_k ∈ W for k = 0, …, t−1}

This is a set, not a distribution. It contains every state that could result from any combination of allowed initial conditions and disturbances. If R_t ∩ X_unsafe = ∅ for all t, the system is safe.

Think of it as a flood: Drop a puddle of water (the initial set) on a landscape (the dynamics). At each time step, the water flows and spreads (disturbances push it in all directions). The reachable set is the total wetted area at time t. If the water never reaches the forbidden zone, the system is safe.

The key recursion for computing the reachable set one step at a time is:

R_t+1 = {f(x, w) : x ∈ R_t, w ∈ W}

Start with R₀ = X₀. At each step, apply the dynamics f to every state in R_t and every disturbance in W. The result is R_t+1. This is conceptually simple but computationally brutal for general f — we'd need to track an infinite set of points. The power of linear dynamics is that this infinite operation reduces to just two set operations.

Forward vs. backward: We could also compute reachability backward: start from the unsafe set and ask "which initial conditions can reach it?" Forward reachability is more intuitive and more common in practice for time-bounded safety verification. Both give equivalent safety verdicts.

Check: What does the reachable set R_t represent?

The most likely state at time t The set of disturbances that cause failure by time t The set of ALL states the system could be in at time t, over all allowed inputs

Chapter 2: Linear Systems

For a general nonlinear system x_t+1 = f(x_t, w_t), computing the reachable set is intractable — it requires tracking the image of a set under an arbitrary function. But for linear time-invariant (LTI) systems, the dynamics have the form:

x_t+1 = A x_t + B w_t

where A is the state transition matrix and B maps disturbances into state space. Both A and B are constant matrices. The initial set X₀ and disturbance set W are compact convex sets (think: ellipsoids, boxes, or polytopes).

Substituting into the reachability recursion:

R_t+1 = {Ax + Bw : x ∈ R_t, w ∈ W}

This decomposes into two operations:

Linear map

A · R_t = {Ax : x ∈ R_t}

Minkowski sum

B · W = {Bw : w ∈ W}

R_t+1

A · R_t ⊕ B · W

The symbol ⊕ denotes the Minkowski sum: P ⊕ Q = {p + q : p ∈ P, q ∈ Q}. We will define both operations precisely in the next chapter. The crucial point is that these two operations — linear transformation and Minkowski sum — are the only tools we need. Everything else in this chapter is about computing them efficiently.

Why linearity matters: For nonlinear f, the image of a convex set under f may not be convex, making exact representation impossible with simple data structures. For linear A, the image of a convex set is always convex. And the Minkowski sum of two convex sets is convex. So convexity is preserved at every step — the reachable set stays convex, and we can represent it compactly.

Let's trace one step in 2D. Suppose R₀ is a unit square centered at the origin, A = [[0.9, 0.1], [−0.1, 0.9]] (a slight rotation + contraction), and BW is a small box [−0.1, 0.1]².

Step	Operation	Result shape
1	A · R₀	Slightly rotated, slightly smaller square
2	(A · R₀) ⊕ BW	Rotated square, "fattened" by the disturbance box

Each step rotates and shrinks (because A has eigenvalues with magnitude 0.9), then fattens by the disturbance. Over many steps, the reachable set converges to a limit set — the disturbance injection balances the contraction. If that limit set avoids the unsafe region, the system is safe forever.

Check: For the linear system x_t+1 = Ax_t + Bw_t, what two set operations compute R_t+1 from R_t?

Linear transformation (AR_t) and Minkowski sum (+ BW) Intersection and union Convex hull and projection

Chapter 3: Set Operations

Let's define the two building blocks precisely and build intuition for each.

Linear transformation. Given a set P and a matrix A, the image is:

A · P = {Ap : p ∈ P}

Geometrically, A stretches, rotates, and/or shears every point in P. If P is a circle, AP is an ellipse. If P is a square, AP is a parallelogram. The key property: if P is convex, AP is convex.

Minkowski sum. Given two sets P and Q:

P ⊕ Q = {p + q : p ∈ P, q ∈ Q}

Geometrically: slide Q around the boundary of P (or equivalently, slide P around Q). The swept area is P ⊕ Q. Equivalently, "fatten" P by the shape of Q. If P is a square and Q is a disk, P ⊕ Q is a rounded rectangle.

Worked example in 1D: Let P = [1, 3] and Q = [−0.5, 0.5]. Then P ⊕ Q = {p + q : 1 ≤ p ≤ 3, −0.5 ≤ q ≤ 0.5} = [0.5, 3.5]. The Minkowski sum of two intervals is the interval of their endpoint sums. In 2D it becomes more interesting.

Minkowski Sum in 2D

The blue set P is fixed. Drag the orange slider to change the size of Q (a disk). The green outline shows P ⊕ Q — P "fattened" by Q. Notice how corners get rounded by the disk.

Radius of Q0.50

The computational cost of these operations depends heavily on how we represent the sets. A set described by 4 vertices is easy to transform; a set described by 1000 vertices is harder. This brings us to the critical question: what data structure should we use for sets? The next three chapters explore three answers — polytopes, zonotopes, and support functions — each with different trade-offs.

Operation	What it does	Preserves convexity?
A · P	Stretch / rotate / shear	Yes
P ⊕ Q	"Fatten" P by shape of Q	Yes (if both convex)
P ∩ Q	Intersection	Yes (if both convex)
conv(P ∪ Q)	Convex hull of union	Yes (by definition)

Check: What is the Minkowski sum of the interval [2, 5] and the interval [−1, 1]?

[1, 6] [2, 6] [3, 4]

Chapter 4: The Dependency Effect

Before we choose a set representation, we need to understand a subtle trap. Set operations treat their operands as independent. Consider computing A · (P ⊕ Q). If we first compute S = P ⊕ Q, then compute A · S, the result is correct. But what about A · P ⊕ A · Q? Is that the same?

For linear maps, yes: A(P ⊕ Q) = AP ⊕ AQ. This is the distributive property. But trouble arises when the same variable appears multiple times in an expression. Consider:

{x + x : x ∈ [−1, 1]} = [−2, 2]

But if we treat the two x's independently:

[−1, 1] ⊕ [−1, 1] = [−2, 2]

In 1D it happens to agree. But in higher dimensions, it often doesn't. Consider x ∈ [−1,1]² and the expression Ax + Bx with A and B being different matrices. The true result is {(A+B)x : x ∈ [−1,1]²}, but Minkowski-summing AP ⊕ BP treats the x in each term independently, potentially producing a larger set. This is the dependency effect (also called the wrapping effect).

The dependency effect: When the same uncertain variable appears in multiple places, set operations that treat each occurrence independently produce an overapproximation — a set that contains the true result but is larger. This bloat accumulates over time steps. After many steps, the reachable set can be absurdly inflated, making safety verification impossible even when the system is safe.

The practical implication: always simplify expressions before applying set operations. Compute (A+B)x as a single linear map on x, rather than computing Ax and Bx separately and summing. This algebraic preprocessing is critical for controlling wrapping.

Without simplification:

R = Ax ⊕ Bx

Treats x as two independent variables → bloat

With simplification:

R = (A + B)x

Single linear map on x → exact

In the reachability recursion R_t+1 = A · R_t ⊕ BW, the dependency effect is manageable because x_t and w_t really are independent (the current state and the current disturbance are separate variables). The danger arises when representing R_t itself involves implicit variables that get duplicated across operations.

Check: What is the dependency effect in set operations?

Sets grow larger when dimensions are correlated Treating repeated occurrences of the same variable independently causes overapproximation Linear maps lose information about the original set

Chapter 5: Polytopes

A polytope is the most natural way to represent a convex set: it's a bounded region defined by flat faces. In 2D, polytopes are convex polygons. In 3D, convex polyhedra. There are two equivalent representations.

H-polytope (half-space representation): the intersection of finitely many half-spaces.

P = {x : Hx ≤ h}

Each row of H defines a face normal, and the corresponding entry of h defines the face offset. A cube in 3D needs 6 half-spaces. An n-gon in 2D needs n half-spaces.

V-polytope (vertex representation): the convex hull of finitely many points.

P = conv(v₁, v₂, …, v_k)

The set of all convex combinations of the vertices. A triangle has 3 vertices. A cube has 8.

Dual representations: Every H-polytope can be converted to a V-polytope and vice versa. But the conversion is expensive. In the worst case, a polytope with m half-spaces can have O(m^⌊d/2⌋) vertices in d dimensions. For d = 10 and m = 100, that's potentially 100⁵ = 10¹⁰ vertices. This exponential blowup makes polytopes impractical for high-dimensional reachability.

Each representation has a "cheap" operation and an "expensive" one:

Operation	H-polytope cost	V-polytope cost
Intersection P ∩ Q	Cheap (stack constraints)	Expensive (vertex enumeration)
Linear map AP	Expensive (need vertex conversion)	Cheap (transform each vertex)
Minkowski sum P ⊕ Q	Expensive (complex algorithm)	Expensive (vertex count explodes)

The critical problem for reachability is the Minkowski sum. In V-representation, if P has m vertices and Q has n vertices, P ⊕ Q can have up to O(mn) vertices in 2D and exponentially more in higher dimensions. Since reachability requires a Minkowski sum at every time step, the vertex count grows exponentially with the number of steps.

The vertex explosion: Start with a 4-vertex square. After one Minkowski sum with a 4-vertex disturbance box, the result can have up to 8 vertices. After 10 steps: potentially 4 × 2¹⁰ = 4096 vertices. After 100 steps: over 10³⁰ vertices. This exponential blowup makes V-polytopes unusable for long time horizons — and motivates the zonotope representation.

Check: Why do V-polytopes become impractical for reachability over many time steps?

Because linear maps on V-polytopes are too slow Because polytopes cannot represent convex sets Because the Minkowski sum causes the vertex count to grow exponentially with each time step

Chapter 6: Zonotopes

A zonotope is a special type of polytope defined by a center and a set of generators. Given a center c ∈ R^d and generator matrix G = [g₁, …, g_k] (each g_i ∈ R^d), the zonotope is:

Z = {c + Gξ : ξ ∈ [−1, 1]^k} = c ⊕ g₁[−1,1] ⊕ … ⊕ g_k[−1,1]

Each generator "stretches" a unit interval along its direction. The zonotope is the Minkowski sum of all these line segments, translated by c. In 2D, a zonotope with 2 generators is a parallelogram. With 3 generators, it's a hexagon. With k generators, it's a convex polygon with at most 2k sides.

The key insight: Zonotopes make the two reachability operations trivial:

Linear map: A · Z(c, G) = Z(Ac, AG). Just multiply the center and every generator by A.

Minkowski sum: Z(c₁, G₁) ⊕ Z(c₂, G₂) = Z(c₁ + c₂, [G₁ | G₂]). Add centers, concatenate generators.

The generator count grows linearly, not exponentially!

Let's trace the reachability step. Suppose R_t = Z(c_t, G_t) with k_t generators, and BW = Z(0, G_w) with k_w generators.

R_t+1 = A · R_t ⊕ BW = Z(Ac_t, AG_t) ⊕ Z(0, G_w) = Z(Ac_t, [AG_t | G_w])

After one step: k_t+1 = k_t + k_w generators. After T steps: k₀ + T · k_w generators. Linear growth! If the initial set has 2 generators and the disturbance has 2 generators, after 100 steps we have 202 generators — not 10³⁰ vertices.

Zonotope vs. Polytope: Reachable Set Growth

Watch the reachable set grow step by step. Left: polytope (vertex count explodes). Right: zonotope (generator count grows linearly). The bar chart below tracks complexity. Click Step to advance one time step.

t = 0

Speed600ms

The cost of generators: While generator count grows linearly, each generator is a d-dimensional vector. For large d, storing 1000 generators costs 1000d floats, and computing AG costs O(d²) per generator. In practice, generator reduction (dropping or merging small generators) keeps the count bounded at a cost of slight overapproximation.

Check: How does the Minkowski sum of two zonotopes Z(c₁, G₁) and Z(c₂, G₂) work?

Add centers, concatenate generator matrices: Z(c₁+c₂, [G₁|G₂]) Multiply the generator matrices: Z(c₁c₂, G₁G₂) Take the convex hull of all generators from both zonotopes

Chapter 7: Overapproximation

Exact reachability is ideal but often too expensive. Overapproximation computes a set P̃ that contains the true reachable set: R_t ⊆ P̃_t. If the overapproximation doesn't intersect the unsafe region, neither does the true reachable set. Safety is preserved.

R_t ⊆ P̃_t and P̃_t ∩ X_unsafe = ∅ ⇒ R_t ∩ X_unsafe = ∅

Why overapproximate? Three reasons. First, the exact reachable set may not be representable in our chosen data structure (e.g., a Minkowski sum of many polytopes isn't a single polytope). Second, exact computation may be too slow. Third, controlled overapproximation lets us use simpler, faster set representations. The trade-off: tighter overapproximations are more expensive but less likely to produce false alarms.

Common overapproximation strategies:

Strategy	Idea	Tightness
Bounding box	Axis-aligned box containing the set	Loose (ignores orientation)
Oriented box	Box aligned to principal axes	Better than axis-aligned
Zonotope reduction	Merge small generators to limit count	Good (controlled bloat)
Support function	Overapprox by a polytope with chosen face directions	Excellent (refineable)

The critical insight: overapproximation introduces one-sided error. If P̃ ∩ X_unsafe ≠ ∅, we cannot conclude the system is unsafe — the intersection might be entirely within the overapproximation gap. We can only conclude safety (when there's no intersection) or "inconclusive" (when there is). To resolve inconclusive cases, we refine the overapproximation or switch to a tighter method.

P̃ ∩ X_unsafe = ∅

System is SAFE (guaranteed)

P̃ ∩ X_unsafe ≠ ∅

INCONCLUSIVE — refine or use tighter method

Generator reduction for zonotopes: After T steps, a zonotope has k₀ + T · k_w generators. To keep complexity bounded, we reduce: sort generators by magnitude, keep the largest k_max, and overapproximate the remaining small generators with a bounding box (which adds d generators in d dimensions). This keeps each step O(d · k_max) and introduces minimal bloat if the discarded generators are small.

Check: If the overapproximation P̃ intersects the unsafe set, what can we conclude?

The system is unsafe Nothing definite — the result is inconclusive and we need a tighter approximation The overapproximation is wrong

Chapter 8: Support Functions

All the set representations so far store explicit geometric data (vertices, generators, half-spaces). The support function takes a fundamentally different approach: it represents a convex set by answering the question "how far does the set extend in direction d?" for any direction d.

ρ_P(d) = max { d^Tx : x ∈ P }

The support function ρ_P(d) gives the maximum "projection" of P along direction d. It completely characterizes any convex compact set — knowing ρ_P(d) for all directions d uniquely determines P.

Think of it as a shadow: Shine a flashlight in direction d at the set P. The support function tells you how far the shadow extends. Repeating for all directions gives you the full silhouette of P from every angle — enough to reconstruct P exactly.

Support functions compose beautifully with our two operations:

ρ_AP(d) = ρ_P(A^Td) (linear map)

ρ_{P ⊕ Q}(d) = ρ_P(d) + ρ_Q(d) (Minkowski sum)

The Minkowski sum becomes simple addition! No vertex explosion, no generator concatenation. The cost is that the support function is defined for all directions d, not just a finite set. In practice, we evaluate it at a finite set of directions d₁, …, d_n and construct an outer-approximating polytope:

P̃ = {x : d_j^Tx ≤ ρ_P(d_j) for j = 1, …, n}

More directions = tighter approximation. The directions can be chosen adaptively — place more directions where the boundary has high curvature.

Iterative refinement: Start with a coarse set of directions (e.g., the axis-aligned ones). Compute the overapproximating polytope. If it intersects the unsafe set, add more directions in the intersection region and recompute. This adaptively tightens the approximation only where needed — a powerful strategy for safety verification.

Representation	Linear map cost	Minkowski sum cost	Storage
V-polytope (m verts)	O(md)	O(m²) vertices — exponential	O(md)
Zonotope (k gens)	O(kd²)	O(1) — concatenate	O(kd)
Support fn (n dirs)	O(n) evaluations	O(n) additions	O(n) scalars + eval oracle

Check: How does the support function of a Minkowski sum P ⊕ Q relate to the support functions of P and Q?

ρ_P⊕Q(d) = ρ_P(d) + ρ_Q(d) ρ_P⊕Q(d) = max(ρ_P(d), ρ_Q(d)) ρ_P⊕Q(d) = ρ_P(d) · ρ_Q(d)

Chapter 9: LP Safety Checking

We now have tools to compute (or overapproximate) the reachable set. The final question is: does it intersect the unsafe region? This is a safety checking problem, and for polytopic sets, it reduces to linear programming (LP).

Suppose the unsafe set is a polytope X_unsafe = {x : Cx ≤ d}. And our (over-approximated) reachable set P̃ is also a polytope or can be queried via a support function. Safety holds if and only if P̃ ∩ X_unsafe = ∅.

To check intersection, we can test each face of X_unsafe. Face j is defined by c_j^Tx ≤ d_j. The reachable set violates this constraint if and only if:

ρ_P̃(c_j) > d_j

If this holds for at least one face j — meaning the reachable set extends beyond at least one face of the unsafe set in the outward direction — then P̃ might not actually intersect X_unsafe. Wait, that's the wrong direction. Let's be precise:

Intersection test: P̃ ∩ X_unsafe ≠ ∅ if and only if there exists x satisfying all constraints of both sets simultaneously. This is a feasibility LP:

Find x such that: x ∈ P̃ and Cx ≤ d.

If the LP is feasible, the sets intersect (potential safety violation). If infeasible, they don't (safe).

Alternatively, using support functions: P̃ and X_unsafe are disjoint if and only if there exists a separating hyperplane. By the separating hyperplane theorem, this is equivalent to finding a direction d such that:

ρ_P̃(d) < −ρ_{X_unsafe}(−d)

In practice, the LP approach is most common. Modern LP solvers handle millions of constraints in seconds, making safety checking fast even for complex reachable sets.

pseudocode
# Safety verification via reachability
Given: A, B, X₀, W, X_unsafe, time horizon T

R ← X₀
for t = 1 to T:
  R ← overapprox(A · R ⊕ B · W)   # zonotope/support fn step
  if LP_feasible(R ∩ X_unsafe):
    return "INCONCLUSIVE at step t"
return "SAFE for [0, T]"

The full pipeline: choose a set representation (zonotope for speed, support function for tightness), propagate the reachable set through the linear dynamics, overapproximate at each step to control complexity, and check intersection with the unsafe set via LP. If the LP is infeasible at every step, the system is provably safe. If feasible at any step, refine the overapproximation or increase the number of support directions.

Step	Tool	Cost
Set propagation	Zonotope / support function	O(kd²) per step
Overapproximation	Generator reduction / direction refinement	O(kd) per step
Safety check	LP feasibility	Polynomial in constraint count
Overall	Full pipeline	Polynomial in T, d, k

Looking ahead: This chapter covered linear systems — the simplest case. Chapter 9 tackles nonlinear systems (interval arithmetic, Taylor models, neural network verification). Chapter 10 handles discrete and hybrid systems (SAT, probabilistic reachability). The set-based reasoning framework remains the same; only the propagation and overapproximation methods change.

Check: How is the intersection of the reachable set with the unsafe set checked in practice?

By sampling random points from the reachable set By solving a linear programming feasibility problem By comparing the volumes of the two sets