The scheduler that guarantees your robot's motor controller fires every millisecond, your airbag deploys within 10ms, and your pacemaker never misses a beat.
You're building an embedded system with five jobs: read a temperature sensor every 10ms, update an LCD display every 100ms, check physical buttons every 50ms, send telemetry data every 1 second, and blink a status LED every 500ms. How do you structure this code?
The naive approach is a super-loop — a single while(1) that calls each function in sequence. Read sensor. Update display. Check buttons. Send data. Blink LED. Repeat forever.
The problem? If sendData() takes 200ms (waiting for a network ACK), your sensor reading is now 200ms late. Your button feels sluggish. Your LED timing is visibly irregular. The super-loop gives you no timing guarantees.
A Real-Time Operating System (RTOS) solves this by giving each job its own task (like a lightweight thread) with a priority and a period. The RTOS scheduler guarantees that higher-priority tasks preempt lower-priority ones. Your 10ms sensor read always happens on time, even if the network task is blocking.
Top: super-loop execution — notice the jitter when tasks take variable time. Bottom: RTOS tasks — each fires at its exact period regardless of others.
The tradeoff is real: a super-loop uses zero RAM for OS overhead, has no context-switch latency, and is trivial to debug. An RTOS adds ~2-10KB of RAM, ~1-5μs context switch time, and complexity (stack sizing, priority assignment, synchronization). For a blinking LED, a super-loop is fine. For an ABS braking system? You need an RTOS.
| Property | Super-Loop | RTOS |
|---|---|---|
| Timing guarantees | None | Hard deadlines |
| RAM overhead | 0 | 2-10 KB |
| Code structure | Sequential | Independent tasks |
| Worst-case latency | Sum of all task times | Highest-priority task WCET |
| Debugging | Simple (single thread) | Complex (race conditions) |
Not all systems care about wall-clock time. When you sort a list, it doesn't matter if it takes 1ms or 100ms — the result is the same. That's logical time: only the ORDER of operations matters, not their exact timing.
But when your car's ABS system detects a wheel lock, it must release the brake within 5ms. If it responds in 6ms, the tire has already skidded. That's real-time: the wall clock is part of the correctness requirement. A correct answer delivered too late is a wrong answer.
Real-time systems are classified by what happens when a deadline is missed:
Hard real-time: Deadline miss = system failure. Examples: pacemaker pulse, airbag deployment, flight control surface actuation. Miss the deadline and someone may die.
Firm real-time: Late results have zero value but don't cause catastrophe. Examples: video frame rendering (a late frame is dropped, not displayed late), radar track updates. Missing one is tolerable; missing many degrades the system.
Soft real-time: Late results have diminished but nonzero value. Performance degrades gracefully. Examples: audio streaming (occasional buffer underrun = click), UI responsiveness (sluggish but still usable).
Tasks arrive periodically with deadlines. Watch what happens when each type misses its deadline. Click tasks to simulate delays.
A deadline (D) is the latest time by which a task must complete after it is released. Often D = T (deadline equals period), meaning the task must finish before its next release. But D can be shorter than T (e.g., "compute within 5ms of every 20ms sensor reading").
If this inequality is ever violated for a hard real-time task, the system has failed — regardless of how many previous deadlines were met.
In an RTOS, each job becomes a task — an independent thread of execution with its own stack, priority, and timing parameters. Think of tasks like musicians in an orchestra: each plays their own part at their own tempo, and the conductor (scheduler) decides who plays when.
Every periodic task is characterized by three numbers:
The question is: how do we assign priorities? If two tasks are both ready to run, which one goes first?
Let's work through a concrete example. Three tasks:
| Task | Period T | Exec Time C | Deadline D | RM Priority |
|---|---|---|---|---|
| A (sensor) | 10 ms | 3 ms | 10 ms | Highest (1) |
| B (buttons) | 25 ms | 5 ms | 25 ms | Medium (2) |
| C (display) | 50 ms | 10 ms | 50 ms | Lowest (3) |
Task A has the shortest period (10ms), so it gets highest priority. When A is ready, it preempts B and C immediately. B preempts C but yields to A. C only runs when neither A nor B needs the CPU.
Adjust periods and execution times for three tasks. RM automatically assigns priorities (shorter period = higher priority). Watch the timeline update.
Notice that utilization U = C/T for each task represents the fraction of CPU time that task consumes. Task A uses 3/10 = 30% of the CPU. Task B uses 5/25 = 20%. Task C uses 10/50 = 20%. Total = 70%. The CPU is idle 30% of the time.
If U > 1 (total utilization exceeds 100%), the system is DEFINITELY unschedulable — there's simply not enough CPU time. But U ≤ 1 doesn't guarantee schedulability either. That's what Chapter 3 addresses.
We know how to assign priorities (shortest period = highest priority). But how do we PROVE the system will meet all deadlines? That's schedulability analysis — the mathematical guarantee that no task will ever miss its deadline.
In 1973, Liu and Layland proved a landmark result: for n periodic tasks with D=T, Rate Monotonic scheduling guarantees all deadlines are met if:
This is the Liu & Layland bound. The right side is a sufficient (but not necessary) condition. If your utilization is below this bound, you're guaranteed schedulable. If it's above, you MIGHT still be schedulable but you need more detailed analysis.
| n (tasks) | Bound n(21/n−1) | Percent |
|---|---|---|
| 1 | 1.000 | 100% |
| 2 | 0.828 | 82.8% |
| 3 | 0.780 | 78.0% |
| 4 | 0.757 | 75.7% |
| 5 | 0.743 | 74.3% |
| 10 | 0.718 | 71.8% |
| ∞ | ln(2) ≈ 0.693 | 69.3% |
Worked example: Our three tasks from Chapter 2: A(T=10, C=3), B(T=25, C=5), C(T=50, C=10).
The bound for n=3 is 0.780. Since 0.70 ≤ 0.780, the system is guaranteed schedulable under RM. No deadline will ever be missed.
Now consider: what if Task C takes 15ms instead of 10ms?
Now U = 0.80 > 0.780 (the n=3 bound). We CANNOT guarantee schedulability with this simple test. But note: it might still be schedulable — the bound is sufficient, not necessary. We'd need response time analysis (exact test) to be sure.
Enter task parameters below. The calculator computes total utilization and checks against the Liu & Layland bound.
Rate Monotonic is one scheduling algorithm. There are several others, each with different tradeoffs. Let's compare the four major approaches.
Table-Driven (Static/Cyclic) Scheduling: The entire schedule is precomputed offline. A table says exactly which task runs at which time slot. Think of it like a train timetable — completely deterministic, zero runtime decisions. Used in safety-critical avionics (DO-178C) because the schedule can be formally verified before deployment.
Priority Preemptive Scheduling: Each task has a fixed priority. The highest-priority READY task always runs. If a higher-priority task becomes ready while a lower-priority task is running, it is immediately preempted (interrupted and suspended). Rate Monotonic is a specific priority assignment within this category.
Round-Robin: All tasks at the same priority level get equal time slices (quanta). After a task's quantum expires, it goes to the back of the queue. Fair but provides no timing guarantees for individual tasks. Often used as the policy WITHIN a priority level (same-priority tasks round-robin among themselves).
Earliest Deadline First (EDF): Dynamic priority — the task whose deadline is closest always runs next. Provably optimal: if ANY algorithm can schedule a task set, EDF can too. Achieves 100% utilization (U ≤ 1 is both necessary and sufficient). But it's harder to implement and analyze, and when overloaded it fails unpredictably (no task is guaranteed).
| Algorithm | Priority Type | Max Utilization | Predictability | Complexity |
|---|---|---|---|---|
| Table-driven | Static (precomputed) | 100% | Perfect | Design-time |
| RM (preemptive) | Fixed (rate-based) | ~69-82% | High | Low |
| Round-robin | Equal | N/A | Low | Very low |
| EDF | Dynamic (deadline) | 100% | Medium | Medium |
Select an algorithm and watch 3 tasks being scheduled on a Gantt chart. Notice preemptions, idle time, and deadline handling.
Table-driven in practice: The system designer computes a major frame (LCM of all periods) and a minor frame (GCD of all periods). Within each minor frame, specific tasks are assigned specific time slots. No runtime scheduler needed — just a timer interrupt that walks through the table.
c // Table-driven schedule for tasks A(T=10), B(T=20), C(T=40) // Major frame = LCM(10,20,40) = 40ms, Minor frame = GCD = 10ms const schedule_table[] = { { TaskA, TaskB, TaskC }, // slot 0-10ms: A+B+C all run { TaskA }, // slot 10-20ms: only A { TaskA, TaskB }, // slot 20-30ms: A+B { TaskA }, // slot 30-40ms: only A }; // then repeat from slot 0
FreeRTOS is the world's most deployed RTOS kernel — running on billions of microcontrollers from tiny Cortex-M0s to powerful ESP32s. It's open source, tiny (~9KB kernel), and provides the essential primitives: tasks, queues, semaphores, timers.
Creating a task in FreeRTOS requires specifying: a function to execute, a name (for debugging), stack size (in words, not bytes!), parameters, priority, and a handle for later reference.
c void vSensorTask(void *pvParameters) { for(;;) { readTemperature(); vTaskDelay(pdMS_TO_TICKS(10)); // sleep 10ms } } void vDisplayTask(void *pvParameters) { for(;;) { updateLCD(); vTaskDelay(pdMS_TO_TICKS(100)); // sleep 100ms } } int main(void) { xTaskCreate(vSensorTask, "Sensor", 128, NULL, 3, NULL); xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL); vTaskStartScheduler(); // never returns return 0; }
vTaskDelay() puts the task into BLOCKED state — it consumes zero CPU while sleeping. The tick interrupt (typically 1kHz = 1ms resolution) wakes it when the delay expires. This is fundamentally different from a busy-wait loop.FreeRTOS tasks exist in one of four states:
The idle task is created automatically by FreeRTOS at priority 0 (lowest). It runs whenever no other task is ready. Its job: free memory from deleted tasks and (optionally) put the CPU into low-power sleep mode.
The tick interrupt fires every 1ms (configurable via configTICK_RATE_HZ). Each tick, the scheduler checks: has any blocked task's delay expired? Is there a higher-priority task now ready? If so, a context switch occurs.
Watch tasks transition between states. Click buttons to trigger events.
c // Complete 3-task LED blinker using FreeRTOS #include "FreeRTOS.h" #include "task.h" void vRedLED(void *p) { for(;;) { togglePin(RED_PIN); vTaskDelay(pdMS_TO_TICKS(250)); } } void vGreenLED(void *p) { for(;;) { togglePin(GREEN_PIN); vTaskDelay(pdMS_TO_TICKS(500)); } } void vBlueLED(void *p) { for(;;) { togglePin(BLUE_PIN); vTaskDelay(pdMS_TO_TICKS(1000)); } } int main() { xTaskCreate(vRedLED, "Red", 64, NULL, 3, NULL); xTaskCreate(vGreenLED, "Green", 64, NULL, 2, NULL); xTaskCreate(vBlueLED, "Blue", 64, NULL, 1, NULL); vTaskStartScheduler(); }
Tasks don't exist in isolation. They share resources: a UART peripheral, a global data buffer, an I2C bus. When two tasks access the same resource simultaneously, you get race conditions — corrupted data, garbled transmissions, undefined behavior.
Three synchronization primitives solve this:
Binary Semaphore: A flag with two states (available/taken). Used for signaling between tasks — "the data is ready, you can process it now." One task gives (signals), another takes (waits). Think of it like a baton pass in a relay race.
Counting Semaphore: A counter from 0 to N. Used to manage a pool of N identical resources. Example: 3 DMA channels available. Each take() decrements. Each give() increments. When count hits 0, tasks block until a resource is freed.
Mutex (Mutual Exclusion): Like a binary semaphore BUT with ownership — only the task that locked it can unlock it. Also supports priority inheritance (critical for avoiding priority inversion). Use mutexes for protecting shared data, semaphores for signaling.
The fix is priority inheritance: when H blocks on a mutex held by L, L temporarily inherits H's priority. Now M cannot preempt L. L finishes quickly, releases the mutex, drops back to its original priority, and H runs immediately.
c // Mutex protecting shared temperature variable SemaphoreHandle_t xTempMutex; float shared_temperature = 0.0; void vSensorTask(void *p) { for(;;) { float reading = readADC(); if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) { shared_temperature = reading; // protected write xSemaphoreGive(xTempMutex); } vTaskDelay(pdMS_TO_TICKS(10)); } } void vDisplayTask(void *p) { for(;;) { if(xSemaphoreTake(xTempMutex, pdMS_TO_TICKS(10))) { displayTemperature(shared_temperature); // protected read xSemaphoreGive(xTempMutex); } vTaskDelay(pdMS_TO_TICKS(100)); } } int main() { xTempMutex = xSemaphoreCreateMutex(); // has priority inheritance! xTaskCreate(vSensorTask, "Sensor", 128, NULL, 3, NULL); xTaskCreate(vDisplayTask, "Display", 256, NULL, 1, NULL); vTaskStartScheduler(); }
Three tasks: H (high), M (medium), L (low). Watch priority inversion happen, then enable priority inheritance to fix it.
Deadlock is another hazard: Task A holds Mutex 1 and waits for Mutex 2. Task B holds Mutex 2 and waits for Mutex 1. Both are blocked forever. Prevention: always acquire mutexes in the same global order, or use timeout on xSemaphoreTake().
Let's build a complete RTOS application: a real-time control system with four tasks sharing data through queues and mutexes. This is the pattern you'll see in every professional embedded system — from drones to medical devices.
Our system has four tasks:
| Task | Period | WCET | Priority | Role |
|---|---|---|---|---|
| PID Control | 1 ms | 0.3 ms | 4 (highest) | Motor speed regulation |
| Sensor Read | 10 ms | 2 ms | 3 | Read encoder + IMU |
| Display | 100 ms | 15 ms | 2 | Update LCD with status |
| Comms | 1000 ms | 50 ms | 1 (lowest) | Send telemetry via UART |
Utilization check: U = 0.3/1 + 2/10 + 15/100 + 50/1000 = 0.30 + 0.20 + 0.15 + 0.05 = 0.70. For n=4, bound = 0.757. Since 0.70 ≤ 0.757, we're schedulable.
Data flows through the system via queues and mutexes:
Watch all four tasks execute in real-time. Preemptions visible as interruptions. Queue fill levels and mutex states shown below. Adjust parameters to see deadline misses.
c // Complete 4-task RTOS system with queues and mutexes QueueHandle_t sensorQueue; SemaphoreHandle_t motorMutex; float motor_cmd = 0.0; void vPIDTask(void *p) { float sensor_val; for(;;) { if(xQueueReceive(sensorQueue, &sensor_val, 0)) { float output = computePID(sensor_val); xSemaphoreTake(motorMutex, portMAX_DELAY); motor_cmd = output; xSemaphoreGive(motorMutex); } vTaskDelay(pdMS_TO_TICKS(1)); } } void vSensorTask(void *p) { for(;;) { float reading = readEncoder(); xQueueSend(sensorQueue, &reading, 0); vTaskDelay(pdMS_TO_TICKS(10)); } } void vDisplayTask(void *p) { for(;;) { xSemaphoreTake(motorMutex, portMAX_DELAY); lcd_printf("CMD: %.2f", motor_cmd); xSemaphoreGive(motorMutex); vTaskDelay(pdMS_TO_TICKS(100)); } } void vCommsTask(void *p) { for(;;) { xSemaphoreTake(motorMutex, portMAX_DELAY); uart_send("TELEM:%.2f\n", motor_cmd); xSemaphoreGive(motorMutex); vTaskDelay(pdMS_TO_TICKS(1000)); } } int main() { sensorQueue = xQueueCreate(10, sizeof(float)); motorMutex = xSemaphoreCreateMutex(); xTaskCreate(vPIDTask, "PID", 128, NULL, 4, NULL); xTaskCreate(vSensorTask, "Sensor", 128, NULL, 3, NULL); xTaskCreate(vDisplayTask, "Display", 256, NULL, 2, NULL); xTaskCreate(vCommsTask, "Comms", 256, NULL, 1, NULL); vTaskStartScheduler(); }
Your RTOS design looks good on paper. Utilization is under the bound. Priorities are assigned. But real hardware has overheads that paper analysis ignores. Understanding these overheads is the difference between a system that works in the lab and one that works in production.
Context Switch Time: When the scheduler moves from Task A to Task B, it must save A's registers (R0-R15, PSP, FPU state), load B's registers, and update internal data structures. On a Cortex-M4 at 168MHz, this takes ~1-5μs. Seems tiny, but if your PID task runs at 10kHz (100μs period), a 5μs switch is 5% overhead PER switch.
Tick Interrupt Overhead: The system tick (typically 1kHz) fires every 1ms to check timeouts and do scheduling. Each tick ISR takes ~0.5-2μs. At 1kHz, that's 0.05-0.2% CPU — negligible for most systems, but it adds up with many software timers.
Stack Usage: Every task needs its own stack. Too small = stack overflow (silent memory corruption, the hardest bug to find). Too large = wasted RAM. A typical Cortex-M4 task needs 128-512 words (512-2048 bytes) depending on call depth and local variables.
c // Stack usage monitoring void vMonitorTask(void *p) { for(;;) { UBaseType_t hwm = uxTaskGetStackHighWaterMark(NULL); // hwm = minimum free stack words ever seen // If hwm < 20, you're dangerously close to overflow! printf("Stack free: %u words\n", hwm); vTaskDelay(pdMS_TO_TICKS(5000)); } } // CPU utilization measurement (FreeRTOS runtime stats) void vStatsTask(void *p) { char buf[512]; for(;;) { vTaskGetRunTimeStats(buf); // fills buf with per-task CPU% printf("%s\n", buf); vTaskDelay(pdMS_TO_TICKS(10000)); } }
Common RTOS failure modes:
| Problem | Symptom | Cause | Fix |
|---|---|---|---|
| Stack overflow | Random crashes, corrupted data | Task stack too small for call depth | Increase stack, check watermark |
| Priority inversion | High-priority task unresponsive | Low task holds mutex, medium preempts | Use mutex (not semaphore) for PI |
| Deadlock | Two+ tasks frozen forever | Circular mutex dependency | Global lock ordering, timeouts |
| Starvation | Low-priority task never runs | Higher tasks never block | Ensure tasks yield/block regularly |
| Jitter | Inconsistent task periods | vTaskDelay vs vTaskDelayUntil | Use vTaskDelayUntil for periodic tasks |
vTaskDelay(100) sleeps 100ms FROM NOW — so your period is 100ms + execution time (drift accumulates). vTaskDelayUntil(&lastWake, 100) sleeps until an absolute time — your period is exactly 100ms regardless of execution time. Always use vTaskDelayUntil for periodic tasks.See how context switch time and tick overhead eat into available CPU. Adjust overhead to see when tasks start missing deadlines.
Trace tools like Segger SystemView and Percepio Tracealyzer capture every context switch, ISR, and API call with sub-microsecond timestamps. They render as interactive timelines — invaluable for finding timing bugs that printf debugging can't reveal.
You now have the complete toolkit for designing real-time embedded systems: task modeling, schedulability analysis, algorithm selection, implementation with FreeRTOS, synchronization, and performance validation. Let's consolidate.
RMA Bounds Reference:
| n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Bound | 1.000 | 0.828 | 0.780 | 0.757 | 0.743 | 0.735 | 0.729 | 0.724 | 0.721 | 0.718 |
Scheduling Algorithm Decision Matrix:
| Need | Best Algorithm | Why |
|---|---|---|
| Safety-critical, certifiable | Table-driven | Fully deterministic, verifiable offline |
| General embedded, flexible | RM (preemptive) | Simple analysis, predictable failure |
| Maximum CPU utilization | EDF | Optimal, reaches 100% |
| Fair sharing, no hard deadlines | Round-robin | Simple, no starvation |
FreeRTOS API Quick Reference:
| Function | Purpose |
|---|---|
| xTaskCreate() | Create a new task |
| vTaskDelay() | Block for relative time |
| vTaskDelayUntil() | Block until absolute time |
| xSemaphoreCreateMutex() | Create mutex with priority inheritance |
| xSemaphoreTake() | Lock mutex / wait on semaphore |
| xSemaphoreGive() | Unlock mutex / signal semaphore |
| xQueueCreate() | Create inter-task message queue |
| xQueueSend() | Push to queue (blocks if full) |
| xQueueReceive() | Pop from queue (blocks if empty) |
| uxTaskGetStackHighWaterMark() | Check minimum free stack |
Design Challenge: A quadcopter flight controller needs these tasks:
| Task | Period | WCET |
|---|---|---|
| IMU read + filter | 1 ms | 0.4 ms |
| PID control loop | 2 ms | 0.5 ms |
| GPS processing | 100 ms | 8 ms |
| Telemetry TX | 1000 ms | 30 ms |
| Logging to SD | 10 ms | 1 ms |
For n=5, the bound is 0.743. U = 0.86 > 0.743. The simple test FAILS! But U < 1.0, so exact response-time analysis might still show schedulability (since the bound is only sufficient, not necessary). This is where you'd compute Ri for each task iteratively — or you'd optimize WCETs, increase CPU clock, or split the GPS task into shorter chunks.
Connections: