Ch 15: Fault Detection and Integrity Monitoring

Chapter 0: Why Integrity?

A navigation system that gives you a wrong answer without telling you it is wrong is more dangerous than one that gives no answer at all. Integrity is the measure of trust that can be placed in the navigation solution.

For safety-critical applications (aviation, autonomous vehicles, maritime), the navigation system must:

• Detect when a fault has occurred (fault detection)

• Isolate which sensor or source is faulty (fault isolation)

• Exclude the faulty source and continue with the remaining sensors (fault exclusion)

• Alert the user if the remaining solution cannot meet accuracy requirements

Integrity monitoring is layered — multiple techniques are applied simultaneously to catch different types of faults at different timescales:

Layer	Technique	Catches
1	Range checks	Gross hardware failures, out-of-range values
2	Innovation filtering	Measurements inconsistent with the filter state
3	Innovation sequence monitoring	Slowly growing biases (ramp failures)
4	RAIM / consistency checks	Individual faulty measurements
5	Parallel solutions	Identification of which measurement is faulty

The integrity requirement: For aviation CAT I approaches, the probability of an undetected fault causing a position error exceeding the alert limit (40 m lateral) must be less than 10⁻⁷ per approach. This is an extraordinarily low probability — one in ten million — and drives the design of the entire integrity monitoring system.

Check: Why is a navigation system that silently gives wrong answers worse than one that gives no answer?

A wrong answer that the user trusts will lead to dangerous actions (wrong heading, wrong altitude); no answer prompts the user to use backup procedures Because the user will restart the system Because backup systems are always available

Chapter 1: Failure Modes

Each navigation sensor has characteristic failure modes that the integrity monitoring system must handle:

Inertial navigation failures:

• Sudden sensor failure (loss of output or saturated output)

• Gradual bias growth (accelerometer or gyro drift beyond specification)

• Vibration-induced errors (coning, sculling)

• Computational errors (numerical overflow, algorithm bugs)

GNSS failures:

• Satellite clock failure (incorrect time, causing range error)

• Ephemeris error (wrong satellite position)

• Spoofing (fake signals)

• Multipath (reflected signals biasing the range)

• Ionospheric scintillation (rapid signal fading)

Terrestrial radio failures:

• Transmitter off-air or transmitting incorrect signals

• Propagation anomalies (unusual atmospheric conditions)

Dead-reckoning failures:

• Magnetometer interference (passing near steel structures)

• Odometer wheel slip (ice, mud)

• Barometer pressure change (weather front passing)

Feature-matching failures:

• Incorrect map match (wrong road selected)

• TRN match to wrong terrain feature

• Database error

The Kalman filter is vulnerable to faults. If a faulty measurement is processed by the Kalman filter, it corrupts not only the current state estimate but also the error covariance matrix. A single bad measurement can bias the state estimates for many subsequent epochs, long after the fault has ended. This is why fault detection must happen before the measurement update.

Check: Why must faults be detected before the Kalman filter processes the measurement?

A faulty measurement corrupts both the state estimate and the covariance matrix, biasing subsequent estimates for many epochs even after the fault ends The Kalman filter will crash if it receives bad data The measurement cannot be removed once processed

Chapter 2: Range Checks

Range checks are the simplest and first line of defense. They verify that sensor outputs and computed quantities fall within physically reasonable bounds.

Sensor output checks:

• Accelerometer output within ±maximum g range

• Gyro output within ±maximum rotation rate

• GNSS signal strength above minimum threshold

• Barometric pressure within 200–1100 hPa

• Temperature within operating range

Navigation solution checks:

• Latitude within ±90°

• Height within reasonable bounds for the application

• Velocity magnitude within vehicle performance limits

• Attitude angles within expected ranges

Kalman filter state checks:

• Estimated biases within sensor specifications

• Error covariance diagonal elements positive and within reasonable bounds

• Innovation magnitude within expected bounds (this leads to innovation filtering, Chapter 3)

Range checks are computationally trivial and catch gross failures (stuck sensor, overflowed computation, completely wrong satellite). They do not detect small biases or slowly drifting errors.

Check: What types of faults can range checks detect?

Gross failures like stuck sensors, saturated outputs, overflowed computations, and physically impossible values Small biases that grow slowly over time All types of sensor faults

Chapter 3: Innovation Filtering

The Kalman filter's measurement innovation is the difference between the actual measurement and the value predicted by the filter. Under normal operation, the innovation should be consistent with the predicted innovation covariance:

δz = z̃ − H x̂⁻

S = H P⁻ H^T + R

where S is the innovation covariance matrix. If the filter is working correctly, the innovation should be a zero-mean Gaussian with covariance S.

Innovation filtering test: Check whether each innovation is within a threshold (typically 3–5 σ):

δz_j² / S_jj < γ²

If the innovation exceeds the threshold, the measurement is rejected (not processed by the filter). This prevents faulty measurements from corrupting the state estimate.

The threshold trade-off:

• Too tight (e.g., 2σ): rejects many valid measurements (high false alarm rate), degrading performance

• Too loose (e.g., 6σ): lets faulty measurements through (low detection rate)

• Typical choice: 3–5σ, depending on the application's tolerance for false alarms vs missed detections

Multivariate test: For vector measurements, the normalized innovation squared:

q = δz^T S⁻¹ δz

follows a chi-squared distribution with degrees of freedom equal to the measurement dimension. This catches correlated faults that might not be detected by individual component checks.

Innovation filtering is the workhorse of integrity monitoring. It runs on every measurement at every epoch with minimal computational cost. It catches most sudden faults (step errors, satellite clock jumps) within one measurement epoch. However, it cannot detect slowly growing biases (ramp failures) until they accumulate enough to cross the threshold.

Check: Why can innovation filtering miss slowly growing biases?

A slowly growing bias is absorbed into the Kalman filter state estimates, keeping the innovations small even though the state is being corrupted The threshold is set too high Innovation filtering only checks every 10th measurement

Chapter 4: Innovation Sequence Monitoring

To detect slowly growing faults (ramp failures), the sequence of innovations over time is monitored, not just individual innovations.

Running average test: Compute the mean of the last N innovations. Under normal operation, this should be near zero. A sustained bias in the innovations indicates a fault:

μ̂_j = (1/N) Σ_k δz_j,k

The test statistic μ̂_j² / (S_jj/N) is compared to a chi-squared threshold. The window length N controls the trade-off: longer windows detect smaller biases but with a longer delay.

CUSUM (Cumulative Sum) test: Accumulates the innovations with a bias subtracted:

C_k = max(0, C_k−1 + |δz_k| − ν)

where ν is a reference value (typically set to half the smallest bias to be detected). An alarm is raised when C_k exceeds a threshold. CUSUM is more sensitive to small shifts than the running average test.

Remedying biased state estimates: If innovation sequence monitoring detects a fault that has been present for some time, the Kalman filter state estimates have already been corrupted. Options include:

• Resetting the filter to its state before the fault began (requires storing historical states)

• Inflating the error covariance matrix to "forget" the corrupted information

• Reprocessing the stored measurements without the faulty source

Check: What does innovation sequence monitoring detect that single-epoch innovation filtering cannot?

Slowly growing biases (ramp failures) where each individual innovation is within bounds but the sequence shows a persistent non-zero mean Sudden step errors Complete sensor failures

Chapter 5: RAIM

RAIM (Receiver Autonomous Integrity Monitoring) detects faulty satellite measurements using redundancy. When more satellites are tracked than the minimum needed for a solution, the extra measurements provide consistency checks.

The basic idea: A least-squares position solution from n satellites (n ≥ 5) has n − 4 degrees of freedom. The sum of squared residuals (SSR) measures how well the measurements agree with each other. If all measurements are consistent, the SSR is small. If one measurement is faulty, the SSR increases.

SSR = δz^T(I − H(H^TH)⁻¹H^T) δz

Under the fault-free hypothesis, the SSR follows a chi-squared distribution with n − 4 degrees of freedom. If the SSR exceeds a threshold, a fault is declared.

Fault detection vs fault exclusion:

• FD (Fault Detection): Requires n ≥ 5 satellites. Detects that a fault exists but does not identify which satellite is faulty. The system must alert the user and switch to backup navigation.

• FDE (Fault Detection and Exclusion): Requires n ≥ 6 satellites. By removing each satellite in turn and recomputing the SSR, the faulty satellite can be identified and excluded. Navigation continues with the remaining satellites.

RAIM availability: RAIM requires sufficient satellites with adequate geometry. The RAIM availability is the percentage of time that RAIM can detect a fault of a given size. This depends on the satellite constellation, user location, and masking angle.

RAIM is the foundation of GNSS integrity for aviation. It enables a GNSS receiver to autonomously detect and exclude faulty satellites without relying on external integrity information. With modernized multi-constellation GNSS (GPS + Galileo + GLONASS), RAIM availability exceeds 99.9% globally.

Check: Why does RAIM fault detection require at least 5 satellites?

Four satellites determine the solution exactly (zero residuals); the fifth provides the redundancy needed to check measurement consistency via the sum of squared residuals Five satellites provide better geometry The fifth satellite provides a timing reference

Chapter 6: Parallel Solutions

Parallel solutions run multiple navigation filters simultaneously, each excluding a different sensor or measurement. By comparing the solutions, faults can be detected and isolated.

The method: For n sensors, run n+1 filters:

• Main filter: Uses all n sensors

• Subfilter j: Uses all sensors except sensor j (for j = 1 to n)

If sensor j is faulty, all subfilters that include sensor j will be corrupted, while subfilter j (which excludes it) will be correct. The faulty sensor is identified as the one whose exclusion produces the most consistent solution.

Comparison methods:

• Compare each subfilter's solution with the main filter

• Compare each subfilter with every other subfilter (more robust but more computation)

• Use the innovation sequences from each subfilter

Advantages over RAIM: Parallel solutions work with any type of sensor (not just GNSS range measurements) and can be applied within a Kalman filter framework. They naturally extend to multisensor integration.

Computational cost: Running n+1 filters is expensive. The cost can be reduced by sharing the prediction step (which is the same for all filters) and only computing separate measurement updates. With 6 GNSS satellites and 3 other sensors, this means 10 parallel filters.

Parallel solutions in integrated navigation: In an INS/GNSS system, parallel filters can detect faults in individual GNSS satellites, INS sensors, and other navigation inputs. This provides comprehensive integrity monitoring across all sensor types.

Check: How do parallel solutions identify a faulty sensor?

The subfilter that excludes the faulty sensor will have a consistent solution, while all subfilters that include it will be corrupted; the excluded sensor is identified as faulty The filter with the highest innovation is the faulty one All subfilters converge to the same solution

Chapter 7: Certified Integrity

Certified integrity monitoring goes beyond fault detection to provide a guaranteed bound on the position error with a specified probability. This is required for safety-of-life applications such as aviation precision approach.

Key concepts:

• Alert Limit (AL): The maximum position error that can be tolerated for the operation. For aviation CAT I approach: 40 m lateral, 15 m vertical.

• Protection Level (PL): A real-time bound on the position error, computed by the integrity monitoring algorithm. If PL < AL, the operation is declared safe.

• Integrity Risk: The probability that the true error exceeds the PL without an alert being issued. Must be less than 10⁻⁷ per approach for CAT I.

• Time to Alert (TTA): Maximum time from the onset of a fault to the user being alerted. 6 seconds for CAT I.

Protection level computation: The PL accounts for both the fault-free position error and the position error that would result from an undetected fault. The PL depends on the satellite geometry, measurement noise, fault detection threshold, and the probability of a satellite fault.

ABAS, SBAS, GBAS:

• ABAS (Aircraft-Based): Uses RAIM within the receiver. Provides lateral integrity only.

• SBAS (Satellite-Based): Augmentation systems (WAAS, EGNOS) broadcast integrity information. Provides both lateral and vertical integrity.

• GBAS (Ground-Based): Local ground stations provide integrity for precision approach. Required for CAT II/III approaches.

The protection level must never lie. If the system declares PL = 10 m, the true error must be less than 10 m with probability at least 1 − 10⁻⁷. If the system cannot guarantee this, it must raise an alert. This "never wrong when it matters" property is what makes certified integrity monitoring so demanding to implement.

Check: What is the difference between the alert limit and the protection level?

The alert limit is the maximum tolerable error for the operation (fixed by regulation); the protection level is the real-time computed bound on the actual error. Safe operation requires PL < AL. They are the same thing with different names The protection level is always larger than the alert limit

Chapter 8: Fault Detection Simulation

This simulation shows innovation-based fault detection. A measurement sequence is shown with normal noise. At a random point, a fault is injected (step bias). Watch how the innovation filter detects and rejects the faulty measurements.

Innovation Filtering: Fault Detection

Check: What happens if the innovation filter threshold is set too tight?

Too many valid measurements are rejected (high false alarm rate), degrading navigation performance even when no fault is present Faults are detected faster The filter becomes more stable

Chapter 9: Summary

Key takeaways:
• Integrity = trust in the navigation solution; a wrong answer without warning is worse than no answer
• Failure modes differ by sensor: sudden failures, gradual drift, spoofing, multipath, interference
• Faults must be detected before the Kalman filter processes them; once absorbed, they corrupt the state for many epochs
• Range checks: simple, catch gross failures (out-of-range values, hardware faults)
• Innovation filtering: reject measurements with innovations exceeding 3–5σ; the workhorse of integrity
• Innovation sequence monitoring: detects slowly growing biases via running average or CUSUM tests
• RAIM: uses GNSS measurement redundancy; FD needs ≥5 satellites, FDE needs ≥6
• Parallel solutions: run n+1 filters each excluding one sensor; identifies the faulty sensor
• Certified integrity: protection level must be less than alert limit with probability ≥1−10⁻⁷
• ABAS (RAIM), SBAS (WAAS/EGNOS), and GBAS provide different levels of certified integrity for aviation

Check: What is the minimum number of GNSS satellites needed for RAIM fault detection and exclusion (FDE)?

Six: four for the solution, one for fault detection (redundancy), and one more to identify and exclude the faulty satellite Four Eight