Ch 10: Discrete Reachability — Algorithms for Validation

Chapter 0: Why Discrete Reachability?

Imagine a self-driving car navigating an intersection. The car has a finite set of decisions: go straight, turn left, turn right, stop. The traffic light has three states: green, yellow, red. Other vehicles occupy discrete lanes. At this level of abstraction, the world is not a continuous differential equation — it is a graph.

Chapters 8 and 9 handled continuous dynamics (differential equations, neural networks). But many systems are naturally discrete, or are best analyzed at a discrete level. Digital controllers execute finite state machines. Protocol specifications describe message sequences. Even continuous systems can be abstracted into discrete graphs by partitioning the state space into cells.

The core shift: In continuous reachability, the tools are linear algebra and calculus (zonotopes, Taylor models). In discrete reachability, the tools are graph algorithms (BFS, DFS, shortest paths) and logic (SAT, model checking). The question is the same — "where can the system go?" — but the mathematical universe changes completely.

Discrete reachability has a major advantage: it is exact. There is no overapproximation. If a state is reachable in the graph, it is truly reachable. If it is not reachable, it is truly unreachable. The challenge is not approximation quality but scalability: a system with n binary state variables has 2ⁿ possible states. Exploring them all is infeasible for large n.

Continuous reachability

Tools: calculus, linear algebra. Challenge: overapproximation.

↓ discretize

Discrete reachability

Tools: graph search, logic. Challenge: state space explosion.

Continuous	Discrete
States: Rⁿ	States: finite set S
Dynamics: x' = f(x)	Transitions: edges in a graph
Reachable set: curved blob	Reachable set: subset of nodes
Overapproximate	Exact (but may be huge)

What is the main advantage of discrete reachability over continuous reachability?

Discrete reachability is exact (no overapproximation), while continuous methods must overapproximate Discrete reachability is always faster Discrete reachability can handle more dimensions

Chapter 1: Graph Formulation

A discrete system is a directed graph G = (V, E). The vertices V are the system states. An edge (u, v) ∈ E means the system can transition from state u to state v in one step. Some edges may carry weights (probabilities, costs).

Notation: We write outneighbors(u) for the set of states reachable from u in one step: {v : (u,v) ∈ E}. We write inneighbors(v) for the set of states that can reach v in one step: {u : (u,v) ∈ E}. These two concepts are the workhorses of forward and backward reachability.

Adjacency Representations

How you store the graph determines the cost of reachability queries:

Representation	Space	Outneighbor query	Inneighbor query
Adjacency matrix	O(\|V\|²)	O(\|V\|)	O(\|V\|)
Adjacency list	O(\|V\| + \|E\|)	O(degree)	O(\|V\| + \|E\|) unless stored separately
Both directions	O(\|V\| + \|E\|)	O(degree)	O(degree)

For reachability analysis, we almost always want both forward and backward edges, so we store two adjacency lists: one for outneighbors, one for inneighbors. This doubles the storage but makes both directions equally fast.

Transition Systems vs. Graphs

In verification, the graph often represents a transition system: a tuple (S, S₀, T) where S is the state set, S₀ ⊆ S is the set of initial states, and T ⊆ S × S is the transition relation. States may have labels (e.g., "safe", "failure"). The reachability question becomes: starting from S₀, can the system reach any state labeled "failure"?

Key insight: Reachability on a graph is equivalent to connectivity. The forward reachable set from S₀ is exactly the set of vertices reachable from any vertex in S₀ via directed paths. This is a foundational graph theory problem, solvable by BFS or DFS in O(|V| + |E|) time.

python
# Simple directed graph using adjacency lists
class DiGraph:
    def __init__(self, n):
        self.n = n
        self.out_adj = [[] for _ in range(n)]
        self.in_adj  = [[] for _ in range(n)]

    def add_edge(self, u, v):
        self.out_adj[u].append(v)
        self.in_adj[v].append(u)

What does outneighbors(u) represent in a directed graph?

All vertices in the graph All vertices that have an edge pointing to u All vertices reachable from u in one step

Chapter 2: Forward Reachability

The most natural reachability question: starting from a set of initial states S₀, which states can the system eventually reach? This is forward reachability, and it is solved by breadth-first search (BFS).

BFS explores states layer by layer. Layer 0 is S₀ itself. Layer 1 is everything reachable in one step from S₀. Layer 2 is everything reachable in two steps. And so on, until no new states are discovered.

R₀ = S₀

R_k+1 = R_k ∪ { v : (u, v) ∈ E, u ∈ R_k }

When R_k+1 = R_k, we have found the full forward reachable set. The algorithm terminates in at most |V| steps (since each step adds at least one new state, or we stop).

Key insight: Forward reachability is just BFS with a set of starting nodes instead of one. The total cost is O(|V| + |E|) — linear in the graph size. Each vertex and each edge is examined at most once. This is as efficient as possible for an algorithm that must read the entire reachable portion of the graph.

Grid World Reachability: BFS Explorer

A 7×7 grid with walls (dark), a start cell (teal), and a goal cell (orange). Click Step to expand one BFS layer. Toggle Forward/Backward to switch direction. In Probabilistic mode, cells are colored by occupancy probability.

Layer 0

Speed200ms

What to notice: In forward mode, BFS expands outward from the start cell like a wavefront. Walls block propagation. The BFS layer number equals the shortest path distance from the start. States behind walls may be unreachable entirely. Switch to backward mode (Chapter 3) to see propagation from the goal.

python
from collections import deque

def forward_reachable(graph, initial_states):
    """BFS from a set of initial states. Returns reachable set."""
    visited = set(initial_states)
    queue = deque(initial_states)
    while queue:
        u = queue.popleft()
        for v in graph.out_adj[u]:
            if v not in visited:
                visited.add(v)
                queue.append(v)
    return visited

What determines the BFS layer number of a state in forward reachability?

Its shortest path distance from the initial states Its distance from the goal The number of edges in the graph

Chapter 3: Backward Reachability

Forward reachability answers: "what can the system reach?" Backward reachability answers the dual question: "what can reach the target?" Given a set of target states T (perhaps failure states), backward reachability computes the set of all states from which T is eventually reachable.

B₀ = T

B_k+1 = B_k ∪ { u : (u, v) ∈ E, v ∈ B_k }

This is BFS on the reverse graph — following edges backward (using inneighbors instead of outneighbors). The set B_k contains all states that can reach T in at most k steps.

Why backward? Often we care about a specific dangerous state ("the car hits a pedestrian") and want to know: from which initial conditions is this danger reachable? Forward search from every possible initial state is wasteful. Backward search from the danger set directly identifies the backward reachable set — the set of states we must avoid or handle with care.

Combining Forward and Backward

The most powerful analysis combines both directions:

Forward from S₀

Compute R = states reachable from initial conditions

↓

Backward from T

Compute B = states that can reach the target/failure

↓

Intersection R ∩ B

States that are both reachable AND can lead to failure

If R ∩ B is empty, the system is safe: no initial condition can lead to the failure state. If R ∩ B is non-empty, every state in the intersection lies on a path from S₀ to T — a potential failure trajectory.

Key insight: Forward reachability tells you where you can go. Backward reachability tells you where danger comes from. Their intersection tells you what you need to worry about. This bidirectional analysis is more informative than either direction alone and costs only twice the BFS time: O(2(|V| + |E|)).

Try the simulation in Chapter 2: switch to "Backward" mode and watch BFS expand from the goal. The backward reachable set at layer k contains all cells that can reach the goal in exactly k steps. Compare the forward and backward wavefronts — they meet where paths connect start to goal.

python
def backward_reachable(graph, target_states):
    """BFS on reverse graph from target states."""
    visited = set(target_states)
    queue = deque(target_states)
    while queue:
        v = queue.popleft()
        for u in graph.in_adj[v]:  # inneighbors!
            if u not in visited:
                visited.add(u)
                queue.append(u)
    return visited

# Safety check
R = forward_reachable(G, S0)
B = backward_reachable(G, T_fail)
if R & B:
    print("UNSAFE: failure is reachable")
else:
    print("SAFE: no path from S0 to failure")

If the forward reachable set R from initial states and the backward reachable set B from failure states have an empty intersection, what can we conclude?

The system might still fail under different initial conditions The system is safe: no path exists from any initial state to any failure state Backward reachability was computed incorrectly

Chapter 4: Counterexample Search

Knowing that a failure state is reachable is only half the story. We also want a counterexample: a concrete path from an initial state to the failure state. This path explains how the system can fail and is essential for debugging.

A counterexample is a sequence of states s₀, s₁, ..., s_k where s₀ ∈ S₀, s_k ∈ T, and each (s_i, s_i+1) ∈ E. This is simply a path in the graph from the initial set to the target set.

Search Strategies

Algorithm	Finds	Time	Best for
BFS	Shortest counterexample	O(\|V\| + \|E\|)	Minimum-step failures
DFS	Any counterexample	O(\|V\| + \|E\|)	Quick detection (any path)
A*	Lowest-cost counterexample	O(\|E\| + \|V\| log \|V\|)	Weighted graphs (most likely failure)

BFS finds the shortest counterexample (fewest transitions). This is valuable because shorter failure paths are often more plausible and easier to understand.

DFS uses less memory (O(depth) vs O(branching factor^depth)) and may find a counterexample faster if one exists deep in the graph. But the path it finds may not be the shortest.

A* is optimal for weighted graphs. If edges have costs (e.g., negative log-probabilities), A* with an admissible heuristic finds the minimum-cost path — the most likely failure trajectory.

Counterexamples in model checking: When a model checker like SPIN or NuSMV declares a property violated, it produces a counterexample — a trace from the initial state to the violating state. This trace is the most valuable output of the tool: it tells the engineer exactly what sequence of events leads to failure, enabling targeted fixes.

Extracting the Path

During BFS or DFS, maintain a parent pointer for each discovered state: parent[v] = u means "v was first reached from u." Once the search reaches a target state t, trace back from t through the parent pointers to reconstruct the full path.

python
def find_counterexample(graph, s0, target):
    """BFS shortest path from s0 to any state in target."""
    parent = {s0: None}
    queue = deque([s0])
    while queue:
        u = queue.popleft()
        if u in target:
            # Trace back
            path = []
            while u is not None:
                path.append(u)
                u = parent[u]
            return path[::-1]
        for v in graph.out_adj[u]:
            if v not in parent:
                parent[v] = u
                queue.append(v)
    return None  # no path exists

Why is BFS preferred over DFS when we need the counterexample to be as short as possible?

BFS is always faster than DFS BFS explores states in order of distance from the start, so the first path found to the target is guaranteed to be shortest DFS cannot find paths in directed graphs

Chapter 5: Boolean Satisfiability (SAT)

For small graphs, BFS and DFS work perfectly. But what about systems with 2¹⁰⁰ states? Explicit enumeration is hopeless. SAT-based methods exploit the fact that many large systems have compact symbolic descriptions — the state can be encoded as a vector of Boolean variables, and the transition relation as a logical formula.

The Encoding

Suppose the system has n Boolean state variables x₁, ..., x_n. A state is a truth assignment to these variables (2ⁿ possible states). The transition relation T(x, x') is a Boolean formula relating the current state x to the next state x'.

Example: A 2-bit counter has state variables x₁, x₂. The transition adds 1 modulo 4. The formula is: x'₁ = x₁ ⊕ x₂, x'₂ = ¬x₂. The entire 4-state system is encoded in two simple equations.

The k-step reachability question becomes: is there a sequence of states x⁽⁰⁾, x⁽¹⁾, ..., x^(k) such that x⁽⁰⁾ ∈ S₀, x^(k) ∈ T, and T(x⁽ⁱ⁾, x⁽ⁱ⁺¹⁾) holds for all i? This is a satisfiability problem:

S₀(x⁽⁰⁾) ∧ T(x⁽⁰⁾, x⁽¹⁾) ∧ T(x⁽¹⁾, x⁽²⁾) ∧ … ∧ T(x^(k−1), x^(k)) ∧ T_fail(x^(k))

This formula, called the bounded model checking (BMC) formula, is satisfiable if and only if there exists a failure path of length k. A SAT solver determines satisfiability and, if satisfiable, returns a satisfying assignment — which is exactly the counterexample.

Why SAT beats explicit search: The BMC formula has O(n · k) variables. Modern SAT solvers (MiniSat, Glucose, CaDiCaL) can handle millions of variables. This means we can analyze systems with millions of state bits — far beyond the reach of explicit BFS. The solver's internal search (DPLL with clause learning) prunes the 2^nk search space by exploiting the structure of the formula.

Incremental Deepening

We don't know the minimum failure path length in advance. So we try k = 0, 1, 2, ... increasing the bound until either the formula is satisfiable (failure found) or we reach a completeness threshold. The completeness threshold is the maximum k such that if no failure exists at depth k, none exists at any depth. For simple reachability, this is the diameter of the graph (longest shortest path).

k	Formula size	SAT result	Meaning
0	n variables	UNSAT	No initial state is a failure
1	2n variables	UNSAT	No 1-step path to failure
...	...	...	...
k*	(k*+1)n variables	SAT	Found shortest failure path!

What does it mean when the bounded model checking formula is satisfiable for bound k?

There exists a failure path of length k from an initial state The system is safe for all paths up to length k The SAT solver ran out of time

Chapter 6: Probabilistic Occupancy

So far we have asked binary questions: is a state reachable or not? But if transitions have probabilities, we can ask a richer question: what is the probability of being in each state at time k?

A Markov chain over n states is described by a transition matrix P, where P_ij = Pr(next state = j | current state = i). The occupancy distribution μ_k is a row vector giving the probability of being in each state at time k:

μ_k+1 = μ_k P

Starting from μ₀ (which puts all probability on the initial states), each multiplication by P propagates the distribution one time step. After k steps, μ_k = μ₀ P^k.

Key insight: Probabilistic occupancy is the "soft" version of forward reachability. BFS says "this state is reachable (yes/no)." Occupancy says "this state has probability 0.003 of being occupied at time k." The math is identical in structure — matrix-vector multiplication instead of set union — but the information is richer.

Worked Example

Consider a 3-state system with transition matrix:

P = [0.5, 0.5, 0.0]
[0.0, 0.3, 0.7]
[0.0, 0.0, 1.0]

Starting from state 0: μ₀ = [1, 0, 0].

μ₁ = [1, 0, 0] · P = [0.5, 0.5, 0.0]

μ₂ = [0.5, 0.5, 0.0] · P = [0.25, 0.40, 0.35]

State 2 is an absorbing state (P₂₂ = 1). Its probability grows monotonically. Eventually, all probability mass concentrates there: μ_∞ = [0, 0, 1].

Connection to reachability probability: The probability that the system ever reaches state j, starting from state i, is related to the absorbing Markov chain theory. If j is absorbing, it equals 1 minus the probability of being absorbed elsewhere. The matrix (I − Q)⁻¹, where Q is the transient part of P, gives the expected number of visits to each transient state.

Probabilistic Occupancy Propagation

A 5-state Markov chain. Click Step to propagate the distribution one time step (μ_k+1 = μ_kP). Watch probability flow from the initial state (leftmost) toward the absorbing state (rightmost).

k = 0

How is the occupancy distribution at time k+1 computed from the distribution at time k?

By running a new BFS from the initial states By multiplying the current distribution vector by the transition matrix P By taking the maximum over all neighbor probabilities

Chapter 7: Finite & Infinite Horizon

The reachability probability question comes in two flavors, depending on whether we impose a time limit.

Finite Horizon

Finite-horizon reachability: what is the probability of reaching a target set T within k time steps? Define:

p_k(s) = Pr(reach T within k steps | start in s)

This satisfies a recursive formula. If s ∈ T, then p_k(s) = 1 for all k ≥ 0. Otherwise:

p_k+1(s) = ∑_s' P(s, s') · p_k(s') for s ∉ T

Starting from p₀(s) = 1 if s ∈ T, 0 otherwise, we compute p₁, p₂, ... by propagating backward through the transition matrix. After k iterations, p_k(s) gives the exact k-step reachability probability for each state.

Note the direction: This looks like backward reachability, and it is. We start from the target set and propagate backward through the transition probabilities. This is the probabilistic analogue of the backward BFS in Chapter 3.

Infinite Horizon

Infinite-horizon reachability: what is the probability of ever reaching T? Define:

p_∞(s) = Pr(reach T eventually | start in s)

This is the limit of p_k(s) as k → ∞. If T is reachable with positive probability at each step, the sequence p_k(s) is non-decreasing and bounded by 1, so it converges. In practice, we iterate the recursive formula until convergence (|p_k+1 − p_k| < ε).

Matrix Formulation

Partition the states into target states T and non-target states S \ T. The sub-matrix Q = P restricted to transitions among non-target states governs the dynamics before reaching T. The infinite-horizon reachability probability for non-target states is:

p = (I − Q)⁻¹ · r

where r_s = ∑_{t ∈ T} P(s, t) is the one-step probability of transitioning into T from state s. The matrix (I − Q)⁻¹ is the fundamental matrix of the absorbing Markov chain — its (i,j) entry counts the expected number of times the process visits state j starting from state i before being absorbed.

Solving linear systems vs. iterating: The matrix formula gives the exact answer in one shot (O(n³) for matrix inversion). The iterative approach converges geometrically and may be faster for sparse matrices. Both give the same answer. For large systems, iterative methods with sparse matrix-vector products (O(|E|) per step) are preferred.

Horizon	Question	Method
k steps	Pr(reach T in ≤ k steps)	k matrix-vector products
Infinite	Pr(ever reach T)	Solve (I − Q)p = r

What does the fundamental matrix (I − Q)⁻¹ of an absorbing Markov chain represent?

The transition probabilities between absorbing states The stationary distribution of the chain The expected number of visits to each transient state before absorption

Chapter 8: Discrete State Abstractions

What if the system is continuous but we want to use discrete reachability tools? The answer: discretize. Partition the continuous state space into finitely many cells. Build a graph where each cell is a node, and there is an edge from cell A to cell B if the continuous dynamics can take any state in A to some state in B.

The abstraction idea: A discrete abstraction overapproximates the continuous system. If the abstract graph says cell B is not reachable from cell A, then no continuous state in A can reach any continuous state in B. The guarantee is one-directional: the abstraction may say "reachable" when the true system cannot reach (spurious paths), but it will never miss a real path.

Constructing the Abstract Graph

1. Partition

Divide state space into cells C₁, ..., C_N

↓

2. Reachability per cell

For each cell C_i, compute overapprox of f(C_i) using Ch 9 methods

↓

3. Build edges

Edge C_i → C_j if f(C_i) ∩ C_j ≠ ∅

↓

4. Graph analysis

BFS/DFS/SAT on the abstract graph

Step 2 is where the nonlinear reachability methods from Chapter 9 enter. For each cell, we use interval arithmetic (or Taylor models) to overapproximate the image under the dynamics. The edges are conservative: we add an edge whenever the overapproximated image might overlap a neighboring cell.

Refinement

If the abstract graph says a failure state is reachable but we suspect this is spurious, we can refine: split cells in the abstract path into smaller cells and re-check. If the path survives refinement, it is likely genuine. If it disappears, it was spurious. This counterexample-guided abstraction refinement (CEGAR) loop is the backbone of many verification tools.

Partition type	Cells	Best for
Uniform grid	Hyper-rectangles of equal size	Simple, low dimension
Octree/quadtree	Adaptive rectangles (finer near boundaries)	Non-uniform importance
Simplex	Triangulation of state space	Polyhedral dynamics

Key insight: Discretization converts a continuous reachability problem into a graph problem, letting us use the full toolkit of Chapters 1-7. The cost is overapproximation (spurious edges) and the curse of dimensionality (the number of cells is exponential in the state dimension). This is why discrete abstractions are practical mainly for low-dimensional systems (n ≤ 4 or 5).

Why does a discrete abstraction of a continuous system always overapproximate reachability?

Because edges are added whenever the overapproximated image might overlap a cell, including cases where it does not actually reach it Because the cells are too large Because BFS is an approximate algorithm

Chapter 9: Resolution Tradeoffs

Every method in this chapter lives on a spectrum: finer resolution = tighter results but more computation. This final chapter maps out the tradeoffs and points forward.

The Resolution Ladder

Approach	Resolution	Cost	Tightness
Explicit BFS	Exact (each state)	O(\|V\| + \|E\|)	Exact
SAT/BMC	Symbolic (bit-level)	NP (but often fast)	Exact
Coarse abstraction	Few large cells	Low	Very loose
Fine abstraction	Many small cells	High	Tight
Adaptive (CEGAR)	Fine where needed	Variable	Good

When to Use What

Small state space (|V| < 10⁶): Use explicit BFS. It is simple, exact, and fast. Store the graph in memory and run forward/backward reachability directly.

Large structured state space (|V| > 10⁶, compact description): Use SAT/BMC. Encode the transitions as Boolean formulas and let the SAT solver handle the search. This is the sweet spot for hardware and protocol verification.

Continuous system, low dimension (n ≤ 5): Use discrete abstraction with adaptive refinement. The cell count is manageable, and the graph tools from this chapter apply directly.

Continuous system, high dimension (n > 10): Return to Chapter 9's nonlinear methods (interval arithmetic, Taylor models) or use sampling-based falsification (Chapters 4-5). Discrete abstraction is impractical due to the curse of dimensionality.

The verification engineer's dilemma: Exact methods are complete but expensive. Approximate methods are cheap but may give false alarms (or miss real bugs). The art of verification is choosing the right level of abstraction for the problem at hand — coarse enough to be tractable, fine enough to be useful.

Connections

This chapter closes the reachability trilogy (Chapters 8-10). Each chapter handles a different flavor of dynamics:

What we covered:

Ch 8: Linear systems (zonotopes, polytopes)
Ch 9: Nonlinear systems (intervals, Taylor models)
Ch 10: Discrete systems (graphs, SAT, Markov chains)

What comes next:

Ch 11: Explainability (why did the system fail?)
Ch 12: Runtime monitoring (detecting failures in real time)

The reachability methods provide pre-deployment guarantees: before the system runs, we prove it cannot reach unsafe states. Chapters 11 and 12 address what happens during deployment: monitoring the system in real time and explaining its decisions when things go wrong.

"Testing shows the presence, not the absence of bugs."
— Edsger Dijkstra

For a continuous system with 20 state dimensions, why is discrete abstraction impractical?

BFS cannot handle 20 dimensions SAT solvers do not support continuous variables The number of cells grows exponentially with dimension (curse of dimensionality), making the abstract graph intractably large