Embedded RTOS — From Super-Loops to Real-Time Guarantees

Chapter 0: Why RTOS?

You're building an embedded system with five jobs: read a temperature sensor every 10ms, update an LCD display every 100ms, check physical buttons every 50ms, send telemetry data every 1 second, and blink a status LED every 500ms. How do you structure this code?

The naive approach is a super-loop — a single while(1) that calls each function in sequence. Read sensor. Update display. Check buttons. Send data. Blink LED. Repeat forever.

The problem? If sendData() takes 200ms (waiting for a network ACK), your sensor reading is now 200ms late. Your button feels sluggish. Your LED timing is visibly irregular. The super-loop gives you no timing guarantees.

The core problem: In a super-loop, every task's timing depends on every OTHER task's execution time. One slow function ruins timing for everything. An RTOS gives each job its own independent task with guaranteed scheduling — no task can starve another.

A Real-Time Operating System (RTOS) solves this by giving each job its own task (like a lightweight thread) with a priority and a period. The RTOS scheduler guarantees that higher-priority tasks preempt lower-priority ones. Your 10ms sensor read always happens on time, even if the network task is blocking.

Super-Loop vs RTOS Timing

Top: super-loop execution — notice the jitter when tasks take variable time. Bottom: RTOS tasks — each fires at its exact period regardless of others.

The tradeoff is real: a super-loop uses zero RAM for OS overhead, has no context-switch latency, and is trivial to debug. An RTOS adds ~2-10KB of RAM, ~1-5μs context switch time, and complexity (stack sizing, priority assignment, synchronization). For a blinking LED, a super-loop is fine. For an ABS braking system? You need an RTOS.

Property	Super-Loop	RTOS
Timing guarantees	None	Hard deadlines
RAM overhead	0	2-10 KB
Code structure	Sequential	Independent tasks
Worst-case latency	Sum of all task times	Highest-priority task WCET
Debugging	Simple (single thread)	Complex (race conditions)

Why does a super-loop fail to guarantee timing for the sensor read?

The sensor is too slow to respond Other tasks in the loop can take variable time, delaying the sensor read unpredictably The CPU clock is not accurate enough

Chapter 1: Real-Time vs Logical Time

Not all systems care about wall-clock time. When you sort a list, it doesn't matter if it takes 1ms or 100ms — the result is the same. That's logical time: only the ORDER of operations matters, not their exact timing.

But when your car's ABS system detects a wheel lock, it must release the brake within 5ms. If it responds in 6ms, the tire has already skidded. That's real-time: the wall clock is part of the correctness requirement. A correct answer delivered too late is a wrong answer.

Key distinction: Real-time does NOT mean "fast." It means "predictable." A system that always responds in exactly 50ms is more real-time than one that usually responds in 1ms but occasionally takes 200ms. Worst-case matters, not average-case.

Real-time systems are classified by what happens when a deadline is missed:

Hard real-time: Deadline miss = system failure. Examples: pacemaker pulse, airbag deployment, flight control surface actuation. Miss the deadline and someone may die.

Firm real-time: Late results have zero value but don't cause catastrophe. Examples: video frame rendering (a late frame is dropped, not displayed late), radar track updates. Missing one is tolerable; missing many degrades the system.

Soft real-time: Late results have diminished but nonzero value. Performance degrades gracefully. Examples: audio streaming (occasional buffer underrun = click), UI responsiveness (sluggish but still usable).

Deadline Types Visualized

Tasks arrive periodically with deadlines. Watch what happens when each type misses its deadline. Click tasks to simulate delays.

The Worst-Case Execution Time (WCET) is the longest time a task can possibly take to complete one execution. In real-time analysis, we ALWAYS use WCET, never average. If your task usually takes 2ms but once took 8ms, its WCET is 8ms. Your deadline must accommodate the worst case.

A deadline (D) is the latest time by which a task must complete after it is released. Often D = T (deadline equals period), meaning the task must finish before its next release. But D can be shorter than T (e.g., "compute within 5ms of every 20ms sensor reading").

Response Time = Finish Time − Release Time ≤ Deadline

If this inequality is ever violated for a hard real-time task, the system has failed — regardless of how many previous deadlines were met.

A video game renders frames at 60fps. Occasionally a frame takes 20ms instead of 16.6ms, causing a visible stutter but no crash. What type of real-time is this?

Hard real-time Firm real-time Soft real-time — late frames still display (with degraded smoothness)

Chapter 2: Task Model & Priorities

In an RTOS, each job becomes a task — an independent thread of execution with its own stack, priority, and timing parameters. Think of tasks like musicians in an orchestra: each plays their own part at their own tempo, and the conductor (scheduler) decides who plays when.

Every periodic task is characterized by three numbers:

Period (T) — How often the task runs. T=10ms means it activates every 10ms.
Execution time (C) — How long the task takes to complete (worst case, WCET).
Deadline (D) — Latest time after release by which it must finish. Often D=T.

The question is: how do we assign priorities? If two tasks are both ready to run, which one goes first?

Rate Monotonic (RM) priority assignment: The task with the SHORTEST period gets the HIGHEST priority. Why? Because a faster task has less slack time — it must run more frequently, so it has tighter timing. This is provably optimal among fixed-priority schemes (Liu & Layland, 1973).

Let's work through a concrete example. Three tasks:

Task	Period T	Exec Time C	Deadline D	RM Priority
A (sensor)	10 ms	3 ms	10 ms	Highest (1)
B (buttons)	25 ms	5 ms	25 ms	Medium (2)
C (display)	50 ms	10 ms	50 ms	Lowest (3)

Task A has the shortest period (10ms), so it gets highest priority. When A is ready, it preempts B and C immediately. B preempts C but yields to A. C only runs when neither A nor B needs the CPU.

Task Parameter Editor

Adjust periods and execution times for three tasks. RM automatically assigns priorities (shorter period = higher priority). Watch the timeline update.

Task A period 10ms

Task B period 25ms

Task C period 50ms

Notice that utilization U = C/T for each task represents the fraction of CPU time that task consumes. Task A uses 3/10 = 30% of the CPU. Task B uses 5/25 = 20%. Task C uses 10/50 = 20%. Total = 70%. The CPU is idle 30% of the time.

U = ∑_i=1ⁿ C_i / T_i

If U > 1 (total utilization exceeds 100%), the system is DEFINITELY unschedulable — there's simply not enough CPU time. But U ≤ 1 doesn't guarantee schedulability either. That's what Chapter 3 addresses.

Under Rate Monotonic scheduling, which task gets the highest priority?

The task with the shortest period The task with the longest execution time The task that was created first

Chapter 3: Rate Monotonic Analysis

We know how to assign priorities (shortest period = highest priority). But how do we PROVE the system will meet all deadlines? That's schedulability analysis — the mathematical guarantee that no task will ever miss its deadline.

In 1973, Liu and Layland proved a landmark result: for n periodic tasks with D=T, Rate Monotonic scheduling guarantees all deadlines are met if:

U = ∑_i=1ⁿ C_i / T_i ≤ n(2^1/n − 1)

This is the Liu & Layland bound. The right side is a sufficient (but not necessary) condition. If your utilization is below this bound, you're guaranteed schedulable. If it's above, you MIGHT still be schedulable but you need more detailed analysis.

The bound decreases with n: For n=1: U ≤ 1.000 (trivial — one task can use all CPU). For n=2: U ≤ 0.828. For n=3: U ≤ 0.780. For n=10: U ≤ 0.718. As n→∞: U ≤ ln(2) ≈ 0.693. More tasks = stricter bound because more preemption overhead.

n (tasks)	Bound n(2^1/n−1)	Percent
1	1.000	100%
2	0.828	82.8%
3	0.780	78.0%
4	0.757	75.7%
5	0.743	74.3%
10	0.718	71.8%
∞	ln(2) ≈ 0.693	69.3%

Worked example: Our three tasks from Chapter 2: A(T=10, C=3), B(T=25, C=5), C(T=50, C=10).

U = 3/10 + 5/25 + 10/50 = 0.30 + 0.20 + 0.20 = 0.70

The bound for n=3 is 0.780. Since 0.70 ≤ 0.780, the system is guaranteed schedulable under RM. No deadline will ever be missed.

Now consider: what if Task C takes 15ms instead of 10ms?

U = 3/10 + 5/25 + 15/50 = 0.30 + 0.20 + 0.30 = 0.80

Now U = 0.80 > 0.780 (the n=3 bound). We CANNOT guarantee schedulability with this simple test. But note: it might still be schedulable — the bound is sufficient, not necessary. We'd need response time analysis (exact test) to be sure.

Utilization Calculator

Enter task parameters below. The calculator computes total utilization and checks against the Liu & Layland bound.

Task A: C 3ms

Task B: C 5ms

Task C: C 10ms

Response Time Analysis (exact test): For each task i, compute its worst-case response time R_i iteratively: R_i = C_i + ∑_{j < i} ⌈R_i/T_j⌉ · C_j. If R_i ≤ D_i for all tasks, the system is schedulable. This is necessary AND sufficient for fixed-priority scheduling.

A 4-task system has utilization U = 0.76. The Liu & Layland bound for n=4 is 0.757. What can we conclude?

The system will definitely miss deadlines We cannot guarantee schedulability with this test alone — need exact analysis The system is definitely schedulable

Chapter 4: Scheduling Algorithms

Rate Monotonic is one scheduling algorithm. There are several others, each with different tradeoffs. Let's compare the four major approaches.

Table-Driven (Static/Cyclic) Scheduling: The entire schedule is precomputed offline. A table says exactly which task runs at which time slot. Think of it like a train timetable — completely deterministic, zero runtime decisions. Used in safety-critical avionics (DO-178C) because the schedule can be formally verified before deployment.

Priority Preemptive Scheduling: Each task has a fixed priority. The highest-priority READY task always runs. If a higher-priority task becomes ready while a lower-priority task is running, it is immediately preempted (interrupted and suspended). Rate Monotonic is a specific priority assignment within this category.

Round-Robin: All tasks at the same priority level get equal time slices (quanta). After a task's quantum expires, it goes to the back of the queue. Fair but provides no timing guarantees for individual tasks. Often used as the policy WITHIN a priority level (same-priority tasks round-robin among themselves).

Earliest Deadline First (EDF): Dynamic priority — the task whose deadline is closest always runs next. Provably optimal: if ANY algorithm can schedule a task set, EDF can too. Achieves 100% utilization (U ≤ 1 is both necessary and sufficient). But it's harder to implement and analyze, and when overloaded it fails unpredictably (no task is guaranteed).

EDF vs RM: EDF can schedule task sets that RM cannot (because EDF reaches 100% utilization vs RM's ~69-82%). But RM fails gracefully — lowest-priority tasks miss first, predictably. EDF under overload: random tasks miss deadlines, unpredictably. Safety-critical systems prefer RM's predictable failure mode.

Algorithm	Priority Type	Max Utilization	Predictability	Complexity
Table-driven	Static (precomputed)	100%	Perfect	Design-time
RM (preemptive)	Fixed (rate-based)	~69-82%	High	Low
Round-robin	Equal	N/A	Low	Very low
EDF	Dynamic (deadline)	100%	Medium	Medium

Scheduling Algorithm Comparison

Select an algorithm and watch 3 tasks being scheduled on a Gantt chart. Notice preemptions, idle time, and deadline handling.

Table-driven in practice: The system designer computes a major frame (LCM of all periods) and a minor frame (GCD of all periods). Within each minor frame, specific tasks are assigned specific time slots. No runtime scheduler needed — just a timer interrupt that walks through the table.

c
// Table-driven schedule for tasks A(T=10), B(T=20), C(T=40)
// Major frame = LCM(10,20,40) = 40ms, Minor frame = GCD = 10ms
const schedule_table[] = {
    { TaskA, TaskB, TaskC },  // slot 0-10ms: A+B+C all run
    { TaskA },                // slot 10-20ms: only A
    { TaskA, TaskB },         // slot 20-30ms: A+B
    { TaskA },                // slot 30-40ms: only A
};  // then repeat from slot 0

Which scheduling algorithm can achieve 100% CPU utilization AND is provably optimal (can schedule anything that's schedulable)?

Rate Monotonic Earliest Deadline First (EDF) Round-Robin

Chapter 5: FreeRTOS Basics

FreeRTOS is the world's most deployed RTOS kernel — running on billions of microcontrollers from tiny Cortex-M0s to powerful ESP32s. It's open source, tiny (~9KB kernel), and provides the essential primitives: tasks, queues, semaphores, timers.

Creating a task in FreeRTOS requires specifying: a function to execute, a name (for debugging), stack size (in words, not bytes!), parameters, priority, and a handle for later reference.

c
void vSensorTask(void *pvParameters) {
    for(;;) {
        readTemperature();
        vTaskDelay(pdMS_TO_TICKS(10));  // sleep 10ms
    }
}

void vDisplayTask(void *pvParameters) {
    for(;;) {
        updateLCD();
        vTaskDelay(pdMS_TO_TICKS(100)); // sleep 100ms
    }
}

int main(void) {
    xTaskCreate(vSensorTask, "Sensor", 128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL);
    vTaskStartScheduler();  // never returns
    return 0;
}

Key detail: vTaskDelay() puts the task into BLOCKED state — it consumes zero CPU while sleeping. The tick interrupt (typically 1kHz = 1ms resolution) wakes it when the delay expires. This is fundamentally different from a busy-wait loop.

FreeRTOS tasks exist in one of four states:

Running

Currently executing on CPU (only ONE task at a time on single-core)

↓ preempted or yields

Ready

Able to run but waiting for CPU (higher-priority task is running)

↓ vTaskDelay / queue wait / semaphore wait

Blocked

Waiting for time or event (zero CPU usage)

↓ vTaskSuspend()

Suspended

Explicitly paused, only vTaskResume() can restart

The idle task is created automatically by FreeRTOS at priority 0 (lowest). It runs whenever no other task is ready. Its job: free memory from deleted tasks and (optionally) put the CPU into low-power sleep mode.

The tick interrupt fires every 1ms (configurable via configTICK_RATE_HZ). Each tick, the scheduler checks: has any blocked task's delay expired? Is there a higher-priority task now ready? If so, a context switch occurs.

FreeRTOS Task States

Watch tasks transition between states. Click buttons to trigger events.

c
// Complete 3-task LED blinker using FreeRTOS
#include "FreeRTOS.h"
#include "task.h"

void vRedLED(void *p) {
    for(;;) { togglePin(RED_PIN); vTaskDelay(pdMS_TO_TICKS(250)); }
}
void vGreenLED(void *p) {
    for(;;) { togglePin(GREEN_PIN); vTaskDelay(pdMS_TO_TICKS(500)); }
}
void vBlueLED(void *p) {
    for(;;) { togglePin(BLUE_PIN); vTaskDelay(pdMS_TO_TICKS(1000)); }
}

int main() {
    xTaskCreate(vRedLED,   "Red",   64, NULL, 3, NULL);
    xTaskCreate(vGreenLED, "Green", 64, NULL, 2, NULL);
    xTaskCreate(vBlueLED,  "Blue",  64, NULL, 1, NULL);
    vTaskStartScheduler();
}

A task calls vTaskDelay(pdMS_TO_TICKS(50)). What happens to the task during those 50ms?

It spins in a busy-wait loop consuming CPU It enters Blocked state and consumes zero CPU until the delay expires It is permanently deleted and must be recreated

Chapter 6: Synchronization — Semaphores & Mutexes

Tasks don't exist in isolation. They share resources: a UART peripheral, a global data buffer, an I2C bus. When two tasks access the same resource simultaneously, you get race conditions — corrupted data, garbled transmissions, undefined behavior.

Three synchronization primitives solve this:

Binary Semaphore: A flag with two states (available/taken). Used for signaling between tasks — "the data is ready, you can process it now." One task gives (signals), another takes (waits). Think of it like a baton pass in a relay race.

Counting Semaphore: A counter from 0 to N. Used to manage a pool of N identical resources. Example: 3 DMA channels available. Each take() decrements. Each give() increments. When count hits 0, tasks block until a resource is freed.

Mutex (Mutual Exclusion): Like a binary semaphore BUT with ownership — only the task that locked it can unlock it. Also supports priority inheritance (critical for avoiding priority inversion). Use mutexes for protecting shared data, semaphores for signaling.

Priority Inversion — the bug that nearly killed Mars Pathfinder: Task H (high priority) needs a mutex held by Task L (low priority). H blocks, waiting for L to release it. But Task M (medium priority) preempts L! Now H is blocked by M — even though H is higher priority. This is unbounded priority inversion. It actually happened on Mars Pathfinder in 1997, causing system resets.

The fix is priority inheritance: when H blocks on a mutex held by L, L temporarily inherits H's priority. Now M cannot preempt L. L finishes quickly, releases the mutex, drops back to its original priority, and H runs immediately.

c
// Mutex protecting shared temperature variable
SemaphoreHandle_t xTempMutex;
float shared_temperature = 0.0;

void vSensorTask(void *p) {
    for(;;) {
        float reading = readADC();
        if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) {
            shared_temperature = reading;  // protected write
            xSemaphoreGive(xTempMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

void vDisplayTask(void *p) {
    for(;;) {
        if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) {
            displayTemperature(shared_temperature); // protected read
            xSemaphoreGive(xTempMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

int main() {
    xTempMutex = xSemaphoreCreateMutex();  // has priority inheritance!
    xTaskCreate(vSensorTask,  "Sensor",  128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL);
    vTaskStartScheduler();
}

Priority Inversion Animation

Three tasks: H (high), M (medium), L (low). Watch priority inversion happen, then enable priority inheritance to fix it.

Rule of thumb: Use a mutex when you're protecting a shared resource (data, peripheral). Use a binary semaphore when you're signaling between tasks (ISR signals task, producer signals consumer). Never hold a mutex longer than absolutely necessary — every microsecond you hold it is a microsecond you might block a higher-priority task.

Deadlock is another hazard: Task A holds Mutex 1 and waits for Mutex 2. Task B holds Mutex 2 and waits for Mutex 1. Both are blocked forever. Prevention: always acquire mutexes in the same global order, or use timeout on xSemaphoreTake().

In priority inversion, a high-priority task H is blocked. What is the root cause?

H's stack overflowed H's deadline is too tight A low-priority task holds a mutex H needs, and medium-priority tasks preempt the low task

Chapter 7: Multi-Task System (Showcase)

Let's build a complete RTOS application: a real-time control system with four tasks sharing data through queues and mutexes. This is the pattern you'll see in every professional embedded system — from drones to medical devices.

Our system has four tasks:

Task	Period	WCET	Priority	Role
PID Control	1 ms	0.3 ms	4 (highest)	Motor speed regulation
Sensor Read	10 ms	2 ms	3	Read encoder + IMU
Display	100 ms	15 ms	2	Update LCD with status
Comms	1000 ms	50 ms	1 (lowest)	Send telemetry via UART

Utilization check: U = 0.3/1 + 2/10 + 15/100 + 50/1000 = 0.30 + 0.20 + 0.15 + 0.05 = 0.70. For n=4, bound = 0.757. Since 0.70 ≤ 0.757, we're schedulable.

Data flows through the system via queues and mutexes:

Sensor Task

Reads encoder → pushes to sensorQueue

↓ queue

PID Task

Pops from sensorQueue → computes output → writes to shared motor_cmd (mutex)

↓ mutex

Display Task

Reads motor_cmd (mutex) → updates LCD

↓ mutex

Comms Task

Reads motor_cmd (mutex) → sends via UART

Live RTOS Gantt Chart

Watch all four tasks execute in real-time. Preemptions visible as interruptions. Queue fill levels and mutex states shown below. Adjust parameters to see deadline misses.

PID exec time 0.3ms

Sensor exec time 2ms

Display exec time 15ms

Comms exec time 50ms

c
// Complete 4-task RTOS system with queues and mutexes
QueueHandle_t sensorQueue;
SemaphoreHandle_t motorMutex;
float motor_cmd = 0.0;

void vPIDTask(void *p) {
    float sensor_val;
    for(;;) {
        if(xQueueReceive(sensorQueue, &sensor_val, 0)) {
            float output = computePID(sensor_val);
            xSemaphoreTake(motorMutex, portMAX_DELAY);
            motor_cmd = output;
            xSemaphoreGive(motorMutex);
        }
        vTaskDelay(pdMS_TO_TICKS(1));
    }
}

void vSensorTask(void *p) {
    for(;;) {
        float reading = readEncoder();
        xQueueSend(sensorQueue, &reading, 0);
        vTaskDelay(pdMS_TO_TICKS(10));
    }
}

void vDisplayTask(void *p) {
    for(;;) {
        xSemaphoreTake(motorMutex, portMAX_DELAY);
        lcd_printf("CMD: %.2f", motor_cmd);
        xSemaphoreGive(motorMutex);
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

void vCommsTask(void *p) {
    for(;;) {
        xSemaphoreTake(motorMutex, portMAX_DELAY);
        uart_send("TELEM:%.2f\n", motor_cmd);
        xSemaphoreGive(motorMutex);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

int main() {
    sensorQueue = xQueueCreate(10, sizeof(float));
    motorMutex  = xSemaphoreCreateMutex();
    xTaskCreate(vPIDTask,     "PID",     128, NULL, 4, NULL);
    xTaskCreate(vSensorTask,  "Sensor",  128, NULL, 3, NULL);
    xTaskCreate(vDisplayTask, "Display", 256, NULL, 2, NULL);
    xTaskCreate(vCommsTask,   "Comms",   256, NULL, 1, NULL);
    vTaskStartScheduler();
}

Try it: Increase the PID execution time to 0.9ms. Now U = 0.9/1 + 0.2 + 0.15 + 0.05 = 1.30. That's over 100% — completely impossible to schedule! Watch the Gantt chart show deadline misses cascading through the system.

Chapter 8: Performance Analysis

Your RTOS design looks good on paper. Utilization is under the bound. Priorities are assigned. But real hardware has overheads that paper analysis ignores. Understanding these overheads is the difference between a system that works in the lab and one that works in production.

Context Switch Time: When the scheduler moves from Task A to Task B, it must save A's registers (R0-R15, PSP, FPU state), load B's registers, and update internal data structures. On a Cortex-M4 at 168MHz, this takes ~1-5μs. Seems tiny, but if your PID task runs at 10kHz (100μs period), a 5μs switch is 5% overhead PER switch.

Tick Interrupt Overhead: The system tick (typically 1kHz) fires every 1ms to check timeouts and do scheduling. Each tick ISR takes ~0.5-2μs. At 1kHz, that's 0.05-0.2% CPU — negligible for most systems, but it adds up with many software timers.

Stack Usage: Every task needs its own stack. Too small = stack overflow (silent memory corruption, the hardest bug to find). Too large = wasted RAM. A typical Cortex-M4 task needs 128-512 words (512-2048 bytes) depending on call depth and local variables.

Stack overflow is the #1 RTOS bug. Symptoms: random crashes, corrupted variables, wrong task executing. FreeRTOS offers two detection methods: (1) check stack watermark periodically with uxTaskGetStackHighWaterMark(), (2) enable configCHECK_FOR_STACK_OVERFLOW to detect at context switch. Always enable detection during development.

c
// Stack usage monitoring
void vMonitorTask(void *p) {
    for(;;) {
        UBaseType_t hwm = uxTaskGetStackHighWaterMark(NULL);
        // hwm = minimum free stack words ever seen
        // If hwm < 20, you're dangerously close to overflow!
        printf("Stack free: %u words\n", hwm);
        vTaskDelay(pdMS_TO_TICKS(5000));
    }
}

// CPU utilization measurement (FreeRTOS runtime stats)
void vStatsTask(void *p) {
    char buf[512];
    for(;;) {
        vTaskGetRunTimeStats(buf);  // fills buf with per-task CPU%
        printf("%s\n", buf);
        vTaskDelay(pdMS_TO_TICKS(10000));
    }
}

Common RTOS failure modes:

Problem	Symptom	Cause	Fix
Stack overflow	Random crashes, corrupted data	Task stack too small for call depth	Increase stack, check watermark
Priority inversion	High-priority task unresponsive	Low task holds mutex, medium preempts	Use mutex (not semaphore) for PI
Deadlock	Two+ tasks frozen forever	Circular mutex dependency	Global lock ordering, timeouts
Starvation	Low-priority task never runs	Higher tasks never block	Ensure tasks yield/block regularly
Jitter	Inconsistent task periods	vTaskDelay vs vTaskDelayUntil	Use vTaskDelayUntil for periodic tasks

vTaskDelay vs vTaskDelayUntil: vTaskDelay(100) sleeps 100ms FROM NOW — so your period is 100ms + execution time (drift accumulates). vTaskDelayUntil(&lastWake, 100) sleeps until an absolute time — your period is exactly 100ms regardless of execution time. Always use vTaskDelayUntil for periodic tasks.

RTOS Overhead Visualizer

See how context switch time and tick overhead eat into available CPU. Adjust overhead to see when tasks start missing deadlines.

Context switch time (μs) 3μs

Tick rate (kHz) 1kHz

Trace tools like Segger SystemView and Percepio Tracealyzer capture every context switch, ISR, and API call with sub-microsecond timestamps. They render as interactive timelines — invaluable for finding timing bugs that printf debugging can't reveal.

What's the correct function to use for a task that must execute at a precise 10ms period without drift?

vTaskDelay(pdMS_TO_TICKS(10)) vTaskDelayUntil(&lastWake, pdMS_TO_TICKS(10)) while(getTime() - start < 10) {} // busy wait

Chapter 9: Mastery & Connections

You now have the complete toolkit for designing real-time embedded systems: task modeling, schedulability analysis, algorithm selection, implementation with FreeRTOS, synchronization, and performance validation. Let's consolidate.

The RTOS Design Recipe: (1) Identify tasks and their timing requirements. (2) Assign periods, measure WCETs. (3) Check U ≤ n(2^1/n−1). (4) If tight, do exact response-time analysis. (5) Implement with FreeRTOS. (6) Verify with trace tools. (7) Size stacks with watermark checking.

RMA Bounds Reference:

n	1	2	3	4	5	6	7	8	9	10
Bound	1.000	0.828	0.780	0.757	0.743	0.735	0.729	0.724	0.721	0.718

Scheduling Algorithm Decision Matrix:

Need	Best Algorithm	Why
Safety-critical, certifiable	Table-driven	Fully deterministic, verifiable offline
General embedded, flexible	RM (preemptive)	Simple analysis, predictable failure
Maximum CPU utilization	EDF	Optimal, reaches 100%
Fair sharing, no hard deadlines	Round-robin	Simple, no starvation

FreeRTOS API Quick Reference:

Function	Purpose
xTaskCreate()	Create a new task
vTaskDelay()	Block for relative time
vTaskDelayUntil()	Block until absolute time
xSemaphoreCreateMutex()	Create mutex with priority inheritance
xSemaphoreTake()	Lock mutex / wait on semaphore
xSemaphoreGive()	Unlock mutex / signal semaphore
xQueueCreate()	Create inter-task message queue
xQueueSend()	Push to queue (blocks if full)
xQueueReceive()	Pop from queue (blocks if empty)
uxTaskGetStackHighWaterMark()	Check minimum free stack

Design Challenge: A quadcopter flight controller needs these tasks:

Task	Period	WCET
IMU read + filter	1 ms	0.4 ms
PID control loop	2 ms	0.5 ms
GPS processing	100 ms	8 ms
Telemetry TX	1000 ms	30 ms
Logging to SD	10 ms	1 ms

U = 0.4/1 + 0.5/2 + 8/100 + 30/1000 + 1/10 = 0.40 + 0.25 + 0.08 + 0.03 + 0.10 = 0.86

For n=5, the bound is 0.743. U = 0.86 > 0.743. The simple test FAILS! But U < 1.0, so exact response-time analysis might still show schedulability (since the bound is only sufficient, not necessary). This is where you'd compute R_i for each task iteratively — or you'd optimize WCETs, increase CPU clock, or split the GPS task into shorter chunks.

What's next: This lesson covered single-core RTOS. Modern embedded systems often use multi-core (SMP) RTOS configurations, interrupt nesting, memory protection units (MPU), and communication stacks (TCP/IP, BLE) that add layers of complexity. Each of these deserves its own deep-dive.

Connections:

Control theory — the PID controller running inside your RTOS tasks
Embedded C — the language all this code is written in
State machines — how individual tasks are structured internally
Digital signal processing — sensor filtering within real-time constraints

A 5-task system has U = 0.75. The RM bound for n=5 is 0.743. What should you do?

Perform exact response-time analysis — the system might still be schedulable since the bound is only sufficient Declare the system unschedulable and redesign Ignore the bound and hope for the best

Embedded Real-TimeOperating Systems