Embedded Systems

Embedded Real-Time
Operating Systems

The scheduler that guarantees your robot's motor controller fires every millisecond, your airbag deploys within 10ms, and your pacemaker never misses a beat.

Prerequisites: Basic programming + Notion of threads. That's it.
10
Chapters
8+
Simulations
0
Assumed Knowledge

Chapter 0: Why RTOS?

You're building an embedded system with five jobs: read a temperature sensor every 10ms, update an LCD display every 100ms, check physical buttons every 50ms, send telemetry data every 1 second, and blink a status LED every 500ms. How do you structure this code?

The naive approach is a super-loop — a single while(1) that calls each function in sequence. Read sensor. Update display. Check buttons. Send data. Blink LED. Repeat forever.

The problem? If sendData() takes 200ms (waiting for a network ACK), your sensor reading is now 200ms late. Your button feels sluggish. Your LED timing is visibly irregular. The super-loop gives you no timing guarantees.

The core problem: In a super-loop, every task's timing depends on every OTHER task's execution time. One slow function ruins timing for everything. An RTOS gives each job its own independent task with guaranteed scheduling — no task can starve another.

A Real-Time Operating System (RTOS) solves this by giving each job its own task (like a lightweight thread) with a priority and a period. The RTOS scheduler guarantees that higher-priority tasks preempt lower-priority ones. Your 10ms sensor read always happens on time, even if the network task is blocking.

Super-Loop vs RTOS Timing

Top: super-loop execution — notice the jitter when tasks take variable time. Bottom: RTOS tasks — each fires at its exact period regardless of others.

The tradeoff is real: a super-loop uses zero RAM for OS overhead, has no context-switch latency, and is trivial to debug. An RTOS adds ~2-10KB of RAM, ~1-5μs context switch time, and complexity (stack sizing, priority assignment, synchronization). For a blinking LED, a super-loop is fine. For an ABS braking system? You need an RTOS.

PropertySuper-LoopRTOS
Timing guaranteesNoneHard deadlines
RAM overhead02-10 KB
Code structureSequentialIndependent tasks
Worst-case latencySum of all task timesHighest-priority task WCET
DebuggingSimple (single thread)Complex (race conditions)
Why does a super-loop fail to guarantee timing for the sensor read?

Chapter 1: Real-Time vs Logical Time

Not all systems care about wall-clock time. When you sort a list, it doesn't matter if it takes 1ms or 100ms — the result is the same. That's logical time: only the ORDER of operations matters, not their exact timing.

But when your car's ABS system detects a wheel lock, it must release the brake within 5ms. If it responds in 6ms, the tire has already skidded. That's real-time: the wall clock is part of the correctness requirement. A correct answer delivered too late is a wrong answer.

Key distinction: Real-time does NOT mean "fast." It means "predictable." A system that always responds in exactly 50ms is more real-time than one that usually responds in 1ms but occasionally takes 200ms. Worst-case matters, not average-case.

Real-time systems are classified by what happens when a deadline is missed:

Hard real-time: Deadline miss = system failure. Examples: pacemaker pulse, airbag deployment, flight control surface actuation. Miss the deadline and someone may die.

Firm real-time: Late results have zero value but don't cause catastrophe. Examples: video frame rendering (a late frame is dropped, not displayed late), radar track updates. Missing one is tolerable; missing many degrades the system.

Soft real-time: Late results have diminished but nonzero value. Performance degrades gracefully. Examples: audio streaming (occasional buffer underrun = click), UI responsiveness (sluggish but still usable).

Deadline Types Visualized

Tasks arrive periodically with deadlines. Watch what happens when each type misses its deadline. Click tasks to simulate delays.

The Worst-Case Execution Time (WCET) is the longest time a task can possibly take to complete one execution. In real-time analysis, we ALWAYS use WCET, never average. If your task usually takes 2ms but once took 8ms, its WCET is 8ms. Your deadline must accommodate the worst case.

A deadline (D) is the latest time by which a task must complete after it is released. Often D = T (deadline equals period), meaning the task must finish before its next release. But D can be shorter than T (e.g., "compute within 5ms of every 20ms sensor reading").

Response Time = Finish Time − Release Time ≤ Deadline

If this inequality is ever violated for a hard real-time task, the system has failed — regardless of how many previous deadlines were met.

A video game renders frames at 60fps. Occasionally a frame takes 20ms instead of 16.6ms, causing a visible stutter but no crash. What type of real-time is this?

Chapter 2: Task Model & Priorities

In an RTOS, each job becomes a task — an independent thread of execution with its own stack, priority, and timing parameters. Think of tasks like musicians in an orchestra: each plays their own part at their own tempo, and the conductor (scheduler) decides who plays when.

Every periodic task is characterized by three numbers:

The question is: how do we assign priorities? If two tasks are both ready to run, which one goes first?

Rate Monotonic (RM) priority assignment: The task with the SHORTEST period gets the HIGHEST priority. Why? Because a faster task has less slack time — it must run more frequently, so it has tighter timing. This is provably optimal among fixed-priority schemes (Liu & Layland, 1973).

Let's work through a concrete example. Three tasks:

TaskPeriod TExec Time CDeadline DRM Priority
A (sensor)10 ms3 ms10 msHighest (1)
B (buttons)25 ms5 ms25 msMedium (2)
C (display)50 ms10 ms50 msLowest (3)

Task A has the shortest period (10ms), so it gets highest priority. When A is ready, it preempts B and C immediately. B preempts C but yields to A. C only runs when neither A nor B needs the CPU.

Task Parameter Editor

Adjust periods and execution times for three tasks. RM automatically assigns priorities (shorter period = higher priority). Watch the timeline update.

Task A period 10ms
Task B period 25ms
Task C period 50ms

Notice that utilization U = C/T for each task represents the fraction of CPU time that task consumes. Task A uses 3/10 = 30% of the CPU. Task B uses 5/25 = 20%. Task C uses 10/50 = 20%. Total = 70%. The CPU is idle 30% of the time.

U = ∑i=1n Ci / Ti

If U > 1 (total utilization exceeds 100%), the system is DEFINITELY unschedulable — there's simply not enough CPU time. But U ≤ 1 doesn't guarantee schedulability either. That's what Chapter 3 addresses.

Under Rate Monotonic scheduling, which task gets the highest priority?

Chapter 3: Rate Monotonic Analysis

We know how to assign priorities (shortest period = highest priority). But how do we PROVE the system will meet all deadlines? That's schedulability analysis — the mathematical guarantee that no task will ever miss its deadline.

In 1973, Liu and Layland proved a landmark result: for n periodic tasks with D=T, Rate Monotonic scheduling guarantees all deadlines are met if:

U = ∑i=1n Ci / Ti ≤ n(21/n − 1)

This is the Liu & Layland bound. The right side is a sufficient (but not necessary) condition. If your utilization is below this bound, you're guaranteed schedulable. If it's above, you MIGHT still be schedulable but you need more detailed analysis.

The bound decreases with n: For n=1: U ≤ 1.000 (trivial — one task can use all CPU). For n=2: U ≤ 0.828. For n=3: U ≤ 0.780. For n=10: U ≤ 0.718. As n→∞: U ≤ ln(2) ≈ 0.693. More tasks = stricter bound because more preemption overhead.
n (tasks)Bound n(21/n−1)Percent
11.000100%
20.82882.8%
30.78078.0%
40.75775.7%
50.74374.3%
100.71871.8%
ln(2) ≈ 0.69369.3%

Worked example: Our three tasks from Chapter 2: A(T=10, C=3), B(T=25, C=5), C(T=50, C=10).

U = 3/10 + 5/25 + 10/50 = 0.30 + 0.20 + 0.20 = 0.70

The bound for n=3 is 0.780. Since 0.70 ≤ 0.780, the system is guaranteed schedulable under RM. No deadline will ever be missed.

Now consider: what if Task C takes 15ms instead of 10ms?

U = 3/10 + 5/25 + 15/50 = 0.30 + 0.20 + 0.30 = 0.80

Now U = 0.80 > 0.780 (the n=3 bound). We CANNOT guarantee schedulability with this simple test. But note: it might still be schedulable — the bound is sufficient, not necessary. We'd need response time analysis (exact test) to be sure.

Utilization Calculator

Enter task parameters below. The calculator computes total utilization and checks against the Liu & Layland bound.

Task A: C 3ms
Task B: C 5ms
Task C: C 10ms
Response Time Analysis (exact test): For each task i, compute its worst-case response time Ri iteratively: Ri = Ci + ∑j < i ⌈Ri/Tj⌉ · Cj. If Ri ≤ Di for all tasks, the system is schedulable. This is necessary AND sufficient for fixed-priority scheduling.
A 4-task system has utilization U = 0.76. The Liu & Layland bound for n=4 is 0.757. What can we conclude?

Chapter 4: Scheduling Algorithms

Rate Monotonic is one scheduling algorithm. There are several others, each with different tradeoffs. Let's compare the four major approaches.

Table-Driven (Static/Cyclic) Scheduling: The entire schedule is precomputed offline. A table says exactly which task runs at which time slot. Think of it like a train timetable — completely deterministic, zero runtime decisions. Used in safety-critical avionics (DO-178C) because the schedule can be formally verified before deployment.

Priority Preemptive Scheduling: Each task has a fixed priority. The highest-priority READY task always runs. If a higher-priority task becomes ready while a lower-priority task is running, it is immediately preempted (interrupted and suspended). Rate Monotonic is a specific priority assignment within this category.

Round-Robin: All tasks at the same priority level get equal time slices (quanta). After a task's quantum expires, it goes to the back of the queue. Fair but provides no timing guarantees for individual tasks. Often used as the policy WITHIN a priority level (same-priority tasks round-robin among themselves).

Earliest Deadline First (EDF): Dynamic priority — the task whose deadline is closest always runs next. Provably optimal: if ANY algorithm can schedule a task set, EDF can too. Achieves 100% utilization (U ≤ 1 is both necessary and sufficient). But it's harder to implement and analyze, and when overloaded it fails unpredictably (no task is guaranteed).

EDF vs RM: EDF can schedule task sets that RM cannot (because EDF reaches 100% utilization vs RM's ~69-82%). But RM fails gracefully — lowest-priority tasks miss first, predictably. EDF under overload: random tasks miss deadlines, unpredictably. Safety-critical systems prefer RM's predictable failure mode.
AlgorithmPriority TypeMax UtilizationPredictabilityComplexity
Table-drivenStatic (precomputed)100%PerfectDesign-time
RM (preemptive)Fixed (rate-based)~69-82%HighLow
Round-robinEqualN/ALowVery low
EDFDynamic (deadline)100%MediumMedium
Scheduling Algorithm Comparison

Select an algorithm and watch 3 tasks being scheduled on a Gantt chart. Notice preemptions, idle time, and deadline handling.

Table-driven in practice: The system designer computes a major frame (LCM of all periods) and a minor frame (GCD of all periods). Within each minor frame, specific tasks are assigned specific time slots. No runtime scheduler needed — just a timer interrupt that walks through the table.

c
// Table-driven schedule for tasks A(T=10), B(T=20), C(T=40)
// Major frame = LCM(10,20,40) = 40ms, Minor frame = GCD = 10ms
const schedule_table[] = {
    { TaskA, TaskB, TaskC },  // slot 0-10ms: A+B+C all run
    { TaskA },                // slot 10-20ms: only A
    { TaskA, TaskB },         // slot 20-30ms: A+B
    { TaskA },                // slot 30-40ms: only A
};  // then repeat from slot 0
Which scheduling algorithm can achieve 100% CPU utilization AND is provably optimal (can schedule anything that's schedulable)?

Chapter 5: FreeRTOS Basics

FreeRTOS is the world's most deployed RTOS kernel — running on billions of microcontrollers from tiny Cortex-M0s to powerful ESP32s. It's open source, tiny (~9KB kernel), and provides the essential primitives: tasks, queues, semaphores, timers.

Creating a task in FreeRTOS requires specifying: a function to execute, a name (for debugging), stack size (in words, not bytes!), parameters, priority, and a handle for later reference.

c
void vSensorTask(void *pvParameters) {
    for(;;) {
        readTemperature();
        vTaskDelay(pdMS_TO_TICKS(10));  // sleep 10ms
    }
}

void vDisplayTask(void *pvParameters) {
    for(;;) {
        updateLCD();
        vTaskDelay(pdMS_TO_TICKS(100)); // sleep 100ms
    }
}

int main(void) {
    xTaskCreate(vSensorTask, "Sensor", 128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL);
    vTaskStartScheduler();  // never returns
    return 0;
}
Key detail: vTaskDelay() puts the task into BLOCKED state — it consumes zero CPU while sleeping. The tick interrupt (typically 1kHz = 1ms resolution) wakes it when the delay expires. This is fundamentally different from a busy-wait loop.

FreeRTOS tasks exist in one of four states:

Running
Currently executing on CPU (only ONE task at a time on single-core)
↓ preempted or yields
Ready
Able to run but waiting for CPU (higher-priority task is running)
↓ vTaskDelay / queue wait / semaphore wait
Blocked
Waiting for time or event (zero CPU usage)
↓ vTaskSuspend()
Suspended
Explicitly paused, only vTaskResume() can restart

The idle task is created automatically by FreeRTOS at priority 0 (lowest). It runs whenever no other task is ready. Its job: free memory from deleted tasks and (optionally) put the CPU into low-power sleep mode.

The tick interrupt fires every 1ms (configurable via configTICK_RATE_HZ). Each tick, the scheduler checks: has any blocked task's delay expired? Is there a higher-priority task now ready? If so, a context switch occurs.

FreeRTOS Task States

Watch tasks transition between states. Click buttons to trigger events.

c
// Complete 3-task LED blinker using FreeRTOS
#include "FreeRTOS.h"
#include "task.h"

void vRedLED(void *p) {
    for(;;) { togglePin(RED_PIN); vTaskDelay(pdMS_TO_TICKS(250)); }
}
void vGreenLED(void *p) {
    for(;;) { togglePin(GREEN_PIN); vTaskDelay(pdMS_TO_TICKS(500)); }
}
void vBlueLED(void *p) {
    for(;;) { togglePin(BLUE_PIN); vTaskDelay(pdMS_TO_TICKS(1000)); }
}

int main() {
    xTaskCreate(vRedLED,   "Red",   64, NULL, 3, NULL);
    xTaskCreate(vGreenLED, "Green", 64, NULL, 2, NULL);
    xTaskCreate(vBlueLED,  "Blue",  64, NULL, 1, NULL);
    vTaskStartScheduler();
}
A task calls vTaskDelay(pdMS_TO_TICKS(50)). What happens to the task during those 50ms?

Chapter 6: Synchronization — Semaphores & Mutexes

Tasks don't exist in isolation. They share resources: a UART peripheral, a global data buffer, an I2C bus. When two tasks access the same resource simultaneously, you get race conditions — corrupted data, garbled transmissions, undefined behavior.

Three synchronization primitives solve this:

Binary Semaphore: A flag with two states (available/taken). Used for signaling between tasks — "the data is ready, you can process it now." One task gives (signals), another takes (waits). Think of it like a baton pass in a relay race.

Counting Semaphore: A counter from 0 to N. Used to manage a pool of N identical resources. Example: 3 DMA channels available. Each take() decrements. Each give() increments. When count hits 0, tasks block until a resource is freed.

Mutex (Mutual Exclusion): Like a binary semaphore BUT with ownership — only the task that locked it can unlock it. Also supports priority inheritance (critical for avoiding priority inversion). Use mutexes for protecting shared data, semaphores for signaling.

Priority Inversion — the bug that nearly killed Mars Pathfinder: Task H (high priority) needs a mutex held by Task L (low priority). H blocks, waiting for L to release it. But Task M (medium priority) preempts L! Now H is blocked by M — even though H is higher priority. This is unbounded priority inversion. It actually happened on Mars Pathfinder in 1997, causing system resets.

The fix is priority inheritance: when H blocks on a mutex held by L, L temporarily inherits H's priority. Now M cannot preempt L. L finishes quickly, releases the mutex, drops back to its original priority, and H runs immediately.

c
// Mutex protecting shared temperature variable
SemaphoreHandle_t xTempMutex;
float shared_temperature = 0.0;

void vSensorTask(void *p) {
    for(;;) {
        float reading = readADC();
        if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) {
            shared_temperature = reading;  // protected write
            xSemaphoreGive(xTempMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

void vDisplayTask(void *p) {
    for(;;) {
        if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) {
            displayTemperature(shared_temperature); // protected read
            xSemaphoreGive(xTempMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

int main() {
    xTempMutex = xSemaphoreCreateMutex();  // has priority inheritance!
    xTaskCreate(vSensorTask,  "Sensor",  128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL);
    vTaskStartScheduler();
}
Priority Inversion Animation

Three tasks: H (high), M (medium), L (low). Watch priority inversion happen, then enable priority inheritance to fix it.

Rule of thumb: Use a mutex when you're protecting a shared resource (data, peripheral). Use a binary semaphore when you're signaling between tasks (ISR signals task, producer signals consumer). Never hold a mutex longer than absolutely necessary — every microsecond you hold it is a microsecond you might block a higher-priority task.

Deadlock is another hazard: Task A holds Mutex 1 and waits for Mutex 2. Task B holds Mutex 2 and waits for Mutex 1. Both are blocked forever. Prevention: always acquire mutexes in the same global order, or use timeout on xSemaphoreTake().

In priority inversion, a high-priority task H is blocked. What is the root cause?

Chapter 7: Multi-Task System (Showcase)

Let's build a complete RTOS application: a real-time control system with four tasks sharing data through queues and mutexes. This is the pattern you'll see in every professional embedded system — from drones to medical devices.

Our system has four tasks:

TaskPeriodWCETPriorityRole
PID Control1 ms0.3 ms4 (highest)Motor speed regulation
Sensor Read10 ms2 ms3Read encoder + IMU
Display100 ms15 ms2Update LCD with status
Comms1000 ms50 ms1 (lowest)Send telemetry via UART

Utilization check: U = 0.3/1 + 2/10 + 15/100 + 50/1000 = 0.30 + 0.20 + 0.15 + 0.05 = 0.70. For n=4, bound = 0.757. Since 0.70 ≤ 0.757, we're schedulable.

Data flows through the system via queues and mutexes:

Sensor Task
Reads encoder → pushes to sensorQueue
↓ queue
PID Task
Pops from sensorQueue → computes output → writes to shared motor_cmd (mutex)
↓ mutex
Display Task
Reads motor_cmd (mutex) → updates LCD
↓ mutex
Comms Task
Reads motor_cmd (mutex) → sends via UART
Live RTOS Gantt Chart

Watch all four tasks execute in real-time. Preemptions visible as interruptions. Queue fill levels and mutex states shown below. Adjust parameters to see deadline misses.

PID exec time 0.3ms
Sensor exec time 2ms
Display exec time 15ms
Comms exec time 50ms
c
// Complete 4-task RTOS system with queues and mutexes
QueueHandle_t sensorQueue;
SemaphoreHandle_t motorMutex;
float motor_cmd = 0.0;

void vPIDTask(void *p) {
    float sensor_val;
    for(;;) {
        if(xQueueReceive(sensorQueue, &sensor_val, 0)) {
            float output = computePID(sensor_val);
            xSemaphoreTake(motorMutex, portMAX_DELAY);
            motor_cmd = output;
            xSemaphoreGive(motorMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(1));
    }
}

void vSensorTask(void *p) {
    for(;;) {
        float reading = readEncoder();
        xQueueSend(sensorQueue, &reading, 0);
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

void vDisplayTask(void *p) {
    for(;;) {
        xSemaphoreTake(motorMutex, portMAX_DELAY);
        lcd_printf("CMD: %.2f", motor_cmd);
        xSemaphoreGive(motorMutex);
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

void vCommsTask(void *p) {
    for(;;) {
        xSemaphoreTake(motorMutex, portMAX_DELAY);
        uart_send("TELEM:%.2f\n", motor_cmd);
        xSemaphoreGive(motorMutex);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

int main() {
    sensorQueue = xQueueCreate(10, sizeof(float));
    motorMutex  = xSemaphoreCreateMutex();
    xTaskCreate(vPIDTask,     "PID",     128, NULL, 4, NULL);
    xTaskCreate(vSensorTask,  "Sensor",  128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 2, NULL);
    xTaskCreate(vCommsTask,   "Comms",   256, NULL, 1, NULL);
    vTaskStartScheduler();
}
Try it: Increase the PID execution time to 0.9ms. Now U = 0.9/1 + 0.2 + 0.15 + 0.05 = 1.30. That's over 100% — completely impossible to schedule! Watch the Gantt chart show deadline misses cascading through the system.

Chapter 8: Performance Analysis

Your RTOS design looks good on paper. Utilization is under the bound. Priorities are assigned. But real hardware has overheads that paper analysis ignores. Understanding these overheads is the difference between a system that works in the lab and one that works in production.

Context Switch Time: When the scheduler moves from Task A to Task B, it must save A's registers (R0-R15, PSP, FPU state), load B's registers, and update internal data structures. On a Cortex-M4 at 168MHz, this takes ~1-5μs. Seems tiny, but if your PID task runs at 10kHz (100μs period), a 5μs switch is 5% overhead PER switch.

Tick Interrupt Overhead: The system tick (typically 1kHz) fires every 1ms to check timeouts and do scheduling. Each tick ISR takes ~0.5-2μs. At 1kHz, that's 0.05-0.2% CPU — negligible for most systems, but it adds up with many software timers.

Stack Usage: Every task needs its own stack. Too small = stack overflow (silent memory corruption, the hardest bug to find). Too large = wasted RAM. A typical Cortex-M4 task needs 128-512 words (512-2048 bytes) depending on call depth and local variables.

Stack overflow is the #1 RTOS bug. Symptoms: random crashes, corrupted variables, wrong task executing. FreeRTOS offers two detection methods: (1) check stack watermark periodically with uxTaskGetStackHighWaterMark(), (2) enable configCHECK_FOR_STACK_OVERFLOW to detect at context switch. Always enable detection during development.
c
// Stack usage monitoring
void vMonitorTask(void *p) {
    for(;;) {
        UBaseType_t hwm = uxTaskGetStackHighWaterMark(NULL);
        // hwm = minimum free stack words ever seen
        // If hwm < 20, you're dangerously close to overflow!
        printf("Stack free: %u words\n", hwm);
        vTaskDelay(pdMS_TO_TICKS(5000));
    }
}

// CPU utilization measurement (FreeRTOS runtime stats)
void vStatsTask(void *p) {
    char buf[512];
    for(;;) {
        vTaskGetRunTimeStats(buf);  // fills buf with per-task CPU%
        printf("%s\n", buf);
        vTaskDelay(pdMS_TO_TICKS(10000));
    }
}

Common RTOS failure modes:

ProblemSymptomCauseFix
Stack overflowRandom crashes, corrupted dataTask stack too small for call depthIncrease stack, check watermark
Priority inversionHigh-priority task unresponsiveLow task holds mutex, medium preemptsUse mutex (not semaphore) for PI
DeadlockTwo+ tasks frozen foreverCircular mutex dependencyGlobal lock ordering, timeouts
StarvationLow-priority task never runsHigher tasks never blockEnsure tasks yield/block regularly
JitterInconsistent task periodsvTaskDelay vs vTaskDelayUntilUse vTaskDelayUntil for periodic tasks
vTaskDelay vs vTaskDelayUntil: vTaskDelay(100) sleeps 100ms FROM NOW — so your period is 100ms + execution time (drift accumulates). vTaskDelayUntil(&lastWake, 100) sleeps until an absolute time — your period is exactly 100ms regardless of execution time. Always use vTaskDelayUntil for periodic tasks.
RTOS Overhead Visualizer

See how context switch time and tick overhead eat into available CPU. Adjust overhead to see when tasks start missing deadlines.

Context switch time (μs) 3μs
Tick rate (kHz) 1kHz

Trace tools like Segger SystemView and Percepio Tracealyzer capture every context switch, ISR, and API call with sub-microsecond timestamps. They render as interactive timelines — invaluable for finding timing bugs that printf debugging can't reveal.

What's the correct function to use for a task that must execute at a precise 10ms period without drift?

Chapter 9: Mastery & Connections

You now have the complete toolkit for designing real-time embedded systems: task modeling, schedulability analysis, algorithm selection, implementation with FreeRTOS, synchronization, and performance validation. Let's consolidate.

The RTOS Design Recipe: (1) Identify tasks and their timing requirements. (2) Assign periods, measure WCETs. (3) Check U ≤ n(21/n−1). (4) If tight, do exact response-time analysis. (5) Implement with FreeRTOS. (6) Verify with trace tools. (7) Size stacks with watermark checking.

RMA Bounds Reference:

n12345678910
Bound1.0000.8280.7800.7570.7430.7350.7290.7240.7210.718

Scheduling Algorithm Decision Matrix:

NeedBest AlgorithmWhy
Safety-critical, certifiableTable-drivenFully deterministic, verifiable offline
General embedded, flexibleRM (preemptive)Simple analysis, predictable failure
Maximum CPU utilizationEDFOptimal, reaches 100%
Fair sharing, no hard deadlinesRound-robinSimple, no starvation

FreeRTOS API Quick Reference:

FunctionPurpose
xTaskCreate()Create a new task
vTaskDelay()Block for relative time
vTaskDelayUntil()Block until absolute time
xSemaphoreCreateMutex()Create mutex with priority inheritance
xSemaphoreTake()Lock mutex / wait on semaphore
xSemaphoreGive()Unlock mutex / signal semaphore
xQueueCreate()Create inter-task message queue
xQueueSend()Push to queue (blocks if full)
xQueueReceive()Pop from queue (blocks if empty)
uxTaskGetStackHighWaterMark()Check minimum free stack

Design Challenge: A quadcopter flight controller needs these tasks:

TaskPeriodWCET
IMU read + filter1 ms0.4 ms
PID control loop2 ms0.5 ms
GPS processing100 ms8 ms
Telemetry TX1000 ms30 ms
Logging to SD10 ms1 ms
U = 0.4/1 + 0.5/2 + 8/100 + 30/1000 + 1/10 = 0.40 + 0.25 + 0.08 + 0.03 + 0.10 = 0.86

For n=5, the bound is 0.743. U = 0.86 > 0.743. The simple test FAILS! But U < 1.0, so exact response-time analysis might still show schedulability (since the bound is only sufficient, not necessary). This is where you'd compute Ri for each task iteratively — or you'd optimize WCETs, increase CPU clock, or split the GPS task into shorter chunks.

What's next: This lesson covered single-core RTOS. Modern embedded systems often use multi-core (SMP) RTOS configurations, interrupt nesting, memory protection units (MPU), and communication stacks (TCP/IP, BLE) that add layers of complexity. Each of these deserves its own deep-dive.

Connections:

A 5-task system has U = 0.75. The RM bound for n=5 is 0.743. What should you do?