Groves, Chapter 15

Fault Detection and Integrity Monitoring

Failure modes, range checks, innovation filtering, RAIM, parallel solutions, and certified integrity.

Prerequisites: Chapter 3 (Kalman filter), Chapter 12 (INS/GNSS integration).
10
Chapters
1
Simulation
10
Quizzes

Chapter 0: Why Integrity?

A navigation system that gives you a wrong answer without telling you it is wrong is more dangerous than one that gives no answer at all. Integrity is the measure of trust that can be placed in the navigation solution.

For safety-critical applications (aviation, autonomous vehicles, maritime), the navigation system must:

Detect when a fault has occurred (fault detection)

Isolate which sensor or source is faulty (fault isolation)

Exclude the faulty source and continue with the remaining sensors (fault exclusion)

Alert the user if the remaining solution cannot meet accuracy requirements

Integrity monitoring is layered — multiple techniques are applied simultaneously to catch different types of faults at different timescales:

LayerTechniqueCatches
1Range checksGross hardware failures, out-of-range values
2Innovation filteringMeasurements inconsistent with the filter state
3Innovation sequence monitoringSlowly growing biases (ramp failures)
4RAIM / consistency checksIndividual faulty measurements
5Parallel solutionsIdentification of which measurement is faulty
The integrity requirement: For aviation CAT I approaches, the probability of an undetected fault causing a position error exceeding the alert limit (40 m lateral) must be less than 10−7 per approach. This is an extraordinarily low probability — one in ten million — and drives the design of the entire integrity monitoring system.
Check: Why is a navigation system that silently gives wrong answers worse than one that gives no answer?

Chapter 1: Failure Modes

Each navigation sensor has characteristic failure modes that the integrity monitoring system must handle:

Inertial navigation failures:

• Sudden sensor failure (loss of output or saturated output)

• Gradual bias growth (accelerometer or gyro drift beyond specification)

• Vibration-induced errors (coning, sculling)

• Computational errors (numerical overflow, algorithm bugs)

GNSS failures:

• Satellite clock failure (incorrect time, causing range error)

• Ephemeris error (wrong satellite position)

• Spoofing (fake signals)

• Multipath (reflected signals biasing the range)

• Ionospheric scintillation (rapid signal fading)

Terrestrial radio failures:

• Transmitter off-air or transmitting incorrect signals

• Propagation anomalies (unusual atmospheric conditions)

Dead-reckoning failures:

• Magnetometer interference (passing near steel structures)

• Odometer wheel slip (ice, mud)

• Barometer pressure change (weather front passing)

Feature-matching failures:

• Incorrect map match (wrong road selected)

• TRN match to wrong terrain feature

• Database error

The Kalman filter is vulnerable to faults. If a faulty measurement is processed by the Kalman filter, it corrupts not only the current state estimate but also the error covariance matrix. A single bad measurement can bias the state estimates for many subsequent epochs, long after the fault has ended. This is why fault detection must happen before the measurement update.
Check: Why must faults be detected before the Kalman filter processes the measurement?

Chapter 2: Range Checks

Range checks are the simplest and first line of defense. They verify that sensor outputs and computed quantities fall within physically reasonable bounds.

Sensor output checks:

• Accelerometer output within ±maximum g range

• Gyro output within ±maximum rotation rate

• GNSS signal strength above minimum threshold

• Barometric pressure within 200–1100 hPa

• Temperature within operating range

Navigation solution checks:

• Latitude within ±90°

• Height within reasonable bounds for the application

• Velocity magnitude within vehicle performance limits

• Attitude angles within expected ranges

Kalman filter state checks:

• Estimated biases within sensor specifications

• Error covariance diagonal elements positive and within reasonable bounds

• Innovation magnitude within expected bounds (this leads to innovation filtering, Chapter 3)

Range checks are computationally trivial and catch gross failures (stuck sensor, overflowed computation, completely wrong satellite). They do not detect small biases or slowly drifting errors.

Check: What types of faults can range checks detect?

Chapter 3: Innovation Filtering

The Kalman filter's measurement innovation is the difference between the actual measurement and the value predicted by the filter. Under normal operation, the innovation should be consistent with the predicted innovation covariance:

δz = z̃ − H x̂
S = H P HT + R

where S is the innovation covariance matrix. If the filter is working correctly, the innovation should be a zero-mean Gaussian with covariance S.

Innovation filtering test: Check whether each innovation is within a threshold (typically 3–5 σ):

δzj2 / Sjj < γ2

If the innovation exceeds the threshold, the measurement is rejected (not processed by the filter). This prevents faulty measurements from corrupting the state estimate.

The threshold trade-off:

Too tight (e.g., 2σ): rejects many valid measurements (high false alarm rate), degrading performance

Too loose (e.g., 6σ): lets faulty measurements through (low detection rate)

• Typical choice: 3–5σ, depending on the application's tolerance for false alarms vs missed detections

Multivariate test: For vector measurements, the normalized innovation squared:

q = δzT S−1 δz

follows a chi-squared distribution with degrees of freedom equal to the measurement dimension. This catches correlated faults that might not be detected by individual component checks.

Innovation filtering is the workhorse of integrity monitoring. It runs on every measurement at every epoch with minimal computational cost. It catches most sudden faults (step errors, satellite clock jumps) within one measurement epoch. However, it cannot detect slowly growing biases (ramp failures) until they accumulate enough to cross the threshold.
Check: Why can innovation filtering miss slowly growing biases?

Chapter 4: Innovation Sequence Monitoring

To detect slowly growing faults (ramp failures), the sequence of innovations over time is monitored, not just individual innovations.

Running average test: Compute the mean of the last N innovations. Under normal operation, this should be near zero. A sustained bias in the innovations indicates a fault:

μ̂j = (1/N) Σk δzj,k

The test statistic μ̂j2 / (Sjj/N) is compared to a chi-squared threshold. The window length N controls the trade-off: longer windows detect smaller biases but with a longer delay.

CUSUM (Cumulative Sum) test: Accumulates the innovations with a bias subtracted:

Ck = max(0, Ck−1 + |δzk| − ν)

where ν is a reference value (typically set to half the smallest bias to be detected). An alarm is raised when Ck exceeds a threshold. CUSUM is more sensitive to small shifts than the running average test.

Remedying biased state estimates: If innovation sequence monitoring detects a fault that has been present for some time, the Kalman filter state estimates have already been corrupted. Options include:

• Resetting the filter to its state before the fault began (requires storing historical states)

• Inflating the error covariance matrix to "forget" the corrupted information

• Reprocessing the stored measurements without the faulty source

Check: What does innovation sequence monitoring detect that single-epoch innovation filtering cannot?

Chapter 5: RAIM

RAIM (Receiver Autonomous Integrity Monitoring) detects faulty satellite measurements using redundancy. When more satellites are tracked than the minimum needed for a solution, the extra measurements provide consistency checks.

The basic idea: A least-squares position solution from n satellites (n ≥ 5) has n − 4 degrees of freedom. The sum of squared residuals (SSR) measures how well the measurements agree with each other. If all measurements are consistent, the SSR is small. If one measurement is faulty, the SSR increases.

SSR = δzT(I − H(HTH)−1HT) δz

Under the fault-free hypothesis, the SSR follows a chi-squared distribution with n − 4 degrees of freedom. If the SSR exceeds a threshold, a fault is declared.

Fault detection vs fault exclusion:

FD (Fault Detection): Requires n ≥ 5 satellites. Detects that a fault exists but does not identify which satellite is faulty. The system must alert the user and switch to backup navigation.

FDE (Fault Detection and Exclusion): Requires n ≥ 6 satellites. By removing each satellite in turn and recomputing the SSR, the faulty satellite can be identified and excluded. Navigation continues with the remaining satellites.

RAIM availability: RAIM requires sufficient satellites with adequate geometry. The RAIM availability is the percentage of time that RAIM can detect a fault of a given size. This depends on the satellite constellation, user location, and masking angle.

RAIM is the foundation of GNSS integrity for aviation. It enables a GNSS receiver to autonomously detect and exclude faulty satellites without relying on external integrity information. With modernized multi-constellation GNSS (GPS + Galileo + GLONASS), RAIM availability exceeds 99.9% globally.
Check: Why does RAIM fault detection require at least 5 satellites?

Chapter 6: Parallel Solutions

Parallel solutions run multiple navigation filters simultaneously, each excluding a different sensor or measurement. By comparing the solutions, faults can be detected and isolated.

The method: For n sensors, run n+1 filters:

Main filter: Uses all n sensors

Subfilter j: Uses all sensors except sensor j (for j = 1 to n)

If sensor j is faulty, all subfilters that include sensor j will be corrupted, while subfilter j (which excludes it) will be correct. The faulty sensor is identified as the one whose exclusion produces the most consistent solution.

Comparison methods:

• Compare each subfilter's solution with the main filter

• Compare each subfilter with every other subfilter (more robust but more computation)

• Use the innovation sequences from each subfilter

Advantages over RAIM: Parallel solutions work with any type of sensor (not just GNSS range measurements) and can be applied within a Kalman filter framework. They naturally extend to multisensor integration.

Computational cost: Running n+1 filters is expensive. The cost can be reduced by sharing the prediction step (which is the same for all filters) and only computing separate measurement updates. With 6 GNSS satellites and 3 other sensors, this means 10 parallel filters.

Parallel solutions in integrated navigation: In an INS/GNSS system, parallel filters can detect faults in individual GNSS satellites, INS sensors, and other navigation inputs. This provides comprehensive integrity monitoring across all sensor types.
Check: How do parallel solutions identify a faulty sensor?

Chapter 7: Certified Integrity

Certified integrity monitoring goes beyond fault detection to provide a guaranteed bound on the position error with a specified probability. This is required for safety-of-life applications such as aviation precision approach.

Key concepts:

Alert Limit (AL): The maximum position error that can be tolerated for the operation. For aviation CAT I approach: 40 m lateral, 15 m vertical.

Protection Level (PL): A real-time bound on the position error, computed by the integrity monitoring algorithm. If PL < AL, the operation is declared safe.

Integrity Risk: The probability that the true error exceeds the PL without an alert being issued. Must be less than 10−7 per approach for CAT I.

Time to Alert (TTA): Maximum time from the onset of a fault to the user being alerted. 6 seconds for CAT I.

Protection level computation: The PL accounts for both the fault-free position error and the position error that would result from an undetected fault. The PL depends on the satellite geometry, measurement noise, fault detection threshold, and the probability of a satellite fault.

ABAS, SBAS, GBAS:

ABAS (Aircraft-Based): Uses RAIM within the receiver. Provides lateral integrity only.

SBAS (Satellite-Based): Augmentation systems (WAAS, EGNOS) broadcast integrity information. Provides both lateral and vertical integrity.

GBAS (Ground-Based): Local ground stations provide integrity for precision approach. Required for CAT II/III approaches.

The protection level must never lie. If the system declares PL = 10 m, the true error must be less than 10 m with probability at least 1 − 10−7. If the system cannot guarantee this, it must raise an alert. This "never wrong when it matters" property is what makes certified integrity monitoring so demanding to implement.
Check: What is the difference between the alert limit and the protection level?

Chapter 8: Fault Detection Simulation

This simulation shows innovation-based fault detection. A measurement sequence is shown with normal noise. At a random point, a fault is injected (step bias). Watch how the innovation filter detects and rejects the faulty measurements.

Innovation Filtering: Fault Detection
Check: What happens if the innovation filter threshold is set too tight?

Chapter 9: Summary

Key takeaways:
• Integrity = trust in the navigation solution; a wrong answer without warning is worse than no answer
• Failure modes differ by sensor: sudden failures, gradual drift, spoofing, multipath, interference
• Faults must be detected before the Kalman filter processes them; once absorbed, they corrupt the state for many epochs
• Range checks: simple, catch gross failures (out-of-range values, hardware faults)
• Innovation filtering: reject measurements with innovations exceeding 3–5σ; the workhorse of integrity
• Innovation sequence monitoring: detects slowly growing biases via running average or CUSUM tests
• RAIM: uses GNSS measurement redundancy; FD needs ≥5 satellites, FDE needs ≥6
• Parallel solutions: run n+1 filters each excluding one sensor; identifies the faulty sensor
• Certified integrity: protection level must be less than alert limit with probability ≥1−10−7
• ABAS (RAIM), SBAS (WAAS/EGNOS), and GBAS provide different levels of certified integrity for aviation
Check: What is the minimum number of GNSS satellites needed for RAIM fault detection and exclusion (FDE)?