Embedded Systems

Embedded Real-Time
Controllers

The programming discipline where correctness depends on WHEN your code runs, not just what it computes.

Prerequisites: Basic C programming + Binary/hex numbers. That's it.
10
Chapters
10+
Simulations
0
Assumed Knowledge

Chapter 0: What Is Real-Time?

An airbag must deploy in 10 milliseconds. Not "usually 10ms" — always 10ms. If it takes 11ms, someone dies. Your desktop computer doesn't care if a web page loads in 200ms or 250ms. But an airbag controller? A robotic arm? A pacemaker? For these systems, the time your code finishes is part of its correctness.

This is the fundamental difference. On your laptop, a program is "correct" if it computes the right answer. On an embedded real-time system, a program is correct only if it computes the right answer before its deadline. A perfect answer that arrives late is a wrong answer.

The definition: A real-time system is one where the correctness of a computation depends not only on the logical result, but also on the time at which the result is produced. Missing a deadline is a system failure.

There are three categories of real-time constraints:

TypeDeadline Miss ConsequenceExample
Hard real-timeCatastrophic failure (death, destruction)Airbag, pacemaker, fly-by-wire
Firm real-timeResult is worthless but no catastropheVideo frame decode (dropped frame)
Soft real-timeDegraded quality but still usableAudio streaming, UI responsiveness

The key metric for hard real-time is WCET — Worst-Case Execution Time. Not the average. Not the typical case. The absolute worst case, considering every possible branch, every cache miss, every interrupt. If your WCET exceeds your deadline, your system is broken by design, even if it "usually" works.

Why "usually works" is terrifying: An airbag controller that meets its 10ms deadline 99.99% of the time will still fail once every 10,000 deployments. With millions of cars on the road, that's hundreds of deaths. Hard real-time demands 100.000% — not statistical guarantees, but mathematical proofs.

Consider a simple control loop: read sensor, compute output, write actuator. If this loop must run at 1kHz (every 1ms), then ALL processing — sensor read, computation, actuator write — must complete within 1ms. Every. Single. Time. No garbage collection pauses. No page faults. No "just wait a moment while I resize this hash table."

This is why real-time systems use microcontrollers instead of operating systems like Linux. A microcontroller gives you:

Real-Time Deadline Visualization

Three tasks with deadlines. Watch what happens when Task B takes too long. Green = met deadline. Red = missed deadline.

Task B duration 5ms
A motor controller must update PWM output every 100µs. The computation occasionally takes 120µs due to a floating-point edge case. Is this system correct?

Chapter 1: STM32L475 Architecture

Now that you understand WHY real-time matters, let's meet the hardware that makes it possible. The STM32L475 is an ARM Cortex-M4 microcontroller made by STMicroelectronics. It's the heart of the B-L475E-IOT01A discovery board — a popular development platform for IoT and embedded applications.

Why this chip specifically? Because it sits at the sweet spot: powerful enough for real signal processing (floating-point unit, DSP instructions, 80MHz clock), yet efficient enough to run on a coin cell battery (1.1µA in STOP2 mode). It's what you'd choose for a battery-powered sensor node that occasionally needs to crunch numbers fast.

Think of it this way: Your laptop CPU has billions of transistors, gigabytes of RAM, and consumes 30+ watts. The STM32L475 has ~2 million gates, 128KB of RAM, and consumes 100 microamps during active processing. Same ARM instruction set family. Vastly different design goals.

Core Specifications

FeatureSTM32L475Why It Matters
CPUARM Cortex-M4F @ 80MHzSingle-cycle multiply, hardware FPU, DSP extensions
Flash1 MBYour program lives here (non-volatile)
SRAM128 KBVariables, stack, heap (volatile)
FPUSingle-precision IEEE 754Hardware float in 1 cycle vs 20+ in software
Low-powerSTOP2: 1.1µAYears on a coin cell with periodic wake-up
Timers16 timers (2×32-bit, 14×16-bit)PWM, input capture, periodic interrupts
ADC3×12-bit, 5 MspsRead analog sensors (temperature, voltage, current)
Comms3×SPI, 3×I2C, 6×USART, USBTalk to sensors, displays, radios, PCs

Memory Map

The ARM Cortex-M4 uses a memory-mapped I/O architecture. This means peripherals (timers, GPIO, UART) appear at specific addresses in the same address space as RAM and Flash. Writing to address 0x48000014 doesn't write to RAM — it sets the output pins on GPIO port A. This is how you control hardware: by writing to magic addresses.

Address RangeWhat Lives HereSize
0x0800_0000Flash (your program)1 MB
0x2000_0000SRAM (your variables)128 KB
0x4000_0000APB1 peripherals (TIM2-7, USART2-5, SPI2-3, I2C1-3)-
0x4001_0000APB2 peripherals (TIM1/8/15-17, USART1, SPI1, ADC)-
0x4002_0000AHB1 peripherals (DMA, RCC, Flash control)-
0x4800_0000AHB2 peripherals (GPIO A-H, ADC, RNG)-
0xE000_0000Cortex-M4 internals (NVIC, SysTick, debug)-
Key insight: Everything is an address. There is no "open file," no "call driver." You enable a peripheral by writing a '1' to a specific bit at a specific address. You read a sensor by reading from a specific address. The entire system is just load/store operations to memory-mapped registers.

The Clock Tree

The STM32L475 has a complex clock system. The main system clock (SYSCLK) can come from multiple sources: an internal 4MHz MSI oscillator, an internal 16MHz HSI, or an external crystal (HSE). A PLL (Phase-Locked Loop) multiplies these up. For maximum performance: HSI16 → PLL → 80MHz SYSCLK.

Each peripheral bus has its own clock divider:

Before you can USE any peripheral, you must ENABLE its clock. The RCC (Reset and Clock Control) block controls which peripherals get a clock signal. Peripheral with clock disabled = completely dead, draws zero power.

STM32L475 Block Diagram

Click on any peripheral block to see its base address and key features. The orange paths show the clock distribution.

Before you can toggle a GPIO pin on the STM32L475, what must you do first?

Chapter 2: Register-Level Programming

Forget HAL. Forget Arduino. Forget every abstraction layer you've ever used. We're going bare metal. On a microcontroller, controlling hardware means writing specific values to specific memory addresses. These addresses are called registers — 32-bit locations that directly control hardware behavior.

Why bare metal? Because in real-time systems, you need to know exactly what your code does and exactly how long it takes. HAL functions hide complexity, add overhead, and make timing unpredictable. A single HAL_GPIO_WritePin() call might take 8-15 cycles depending on debug checks. A direct register write takes exactly 1 cycle.

The philosophy: Every peripheral is controlled by a small set of 32-bit registers at known addresses. Each bit in each register has a specific meaning defined in the reference manual. You read the manual, you write the bits, hardware responds. No magic.

Worked Example: Blink LED on PB14

The B-L475E-IOT01A board has an LED connected to pin PB14 (Port B, pin 14). To blink it, we need three steps: (1) enable GPIOB's clock, (2) configure pin 14 as output, (3) toggle the pin.

Step 1: Enable GPIOB clock (RCC_AHB2ENR)

The RCC AHB2 peripheral clock enable register lives at address 0x4002_104C. Bit 1 controls GPIOB's clock.

c
// RCC base: 0x40021000
// AHB2ENR offset: 0x4C
// Bit 1: GPIOBEN
*(volatile uint32_t*)0x4002104C |= (1 << 1);
// Equivalent: RCC->AHB2ENR |= RCC_AHB2ENR_GPIOBEN;

Step 2: Configure PB14 as general-purpose output (GPIOB_MODER)

The MODER register controls pin mode. Each pin uses 2 bits: 00=input, 01=output, 10=alternate function, 11=analog. Pin 14 occupies bits [29:28].

c
// GPIOB base: 0x48000400
// MODER offset: 0x00
// Bits [29:28] for pin 14: set to 01 (output)
volatile uint32_t* GPIOB_MODER = (volatile uint32_t*)0x48000400;
*GPIOB_MODER &= ~(3 << 28);  // Clear bits 29:28
*GPIOB_MODER |=  (1 << 28);  // Set to 01 (output)

Step 3: Toggle PB14 (GPIOB_ODR)

The Output Data Register (ODR) at offset 0x14 directly controls pin state. Bit 14 = pin 14.

c
// GPIOB_ODR at 0x48000414
volatile uint32_t* GPIOB_ODR = (volatile uint32_t*)0x48000414;
*GPIOB_ODR ^= (1 << 14);  // XOR toggles the bit
Timing guarantee: That XOR-and-store compiles to a single STR instruction. On the Cortex-M4 at 80MHz, one bus transaction to AHB2 takes exactly 2 clock cycles = 25 nanoseconds. You know EXACTLY when the pin changes state.

The Complete Blink Program

c
// Bare-metal LED blink — STM32L475, PB14
// No HAL, no libraries, no RTOS

#include <stdint.h>

#define RCC_AHB2ENR   (*(volatile uint32_t*)0x4002104C)
#define GPIOB_MODER   (*(volatile uint32_t*)0x48000400)
#define GPIOB_ODR     (*(volatile uint32_t*)0x48000414)

void delay(volatile uint32_t count) {
    while(count--);  // ~3 cycles per iteration
}

int main(void) {
    // 1. Enable GPIOB clock
    RCC_AHB2ENR |= (1 << 1);

    // 2. Configure PB14 as output
    GPIOB_MODER &= ~(3 << 28);  // Clear
    GPIOB_MODER |=  (1 << 28);  // Output mode

    // 3. Blink forever
    while(1) {
        GPIOB_ODR ^= (1 << 14);  // Toggle LED
        delay(800000);            // ~100ms at 80MHz
    }
}

That's 10 lines of actual logic. No initialization framework, no HAL_Init(), no SystemClock_Config() abstraction. You understand every single byte that flows to the hardware.

The BSRR Register: Atomic Set/Reset

There's a subtle problem with ODR ^= (1 << 14). It's a read-modify-write operation: read ODR, XOR with mask, write back. If an interrupt fires between the read and write, and that ISR also modifies ODR, you get a race condition. The solution is the BSRR (Bit Set/Reset Register):

c
// GPIOB_BSRR at 0x48000418
// Bits [15:0]  — write 1 to SET corresponding pin
// Bits [31:16] — write 1 to RESET corresponding pin
#define GPIOB_BSRR  (*(volatile uint32_t*)0x48000418)

GPIOB_BSRR = (1 << 14);       // SET pin 14 (atomic, single write)
GPIOB_BSRR = (1 << (14+16)); // RESET pin 14 (atomic, single write)
Why BSRR exists: A write to BSRR is a single STR instruction — no read-modify-write. It's inherently atomic. No interrupt can corrupt it. This is a hardware-level solution to a concurrency problem. Real-time engineers think about this constantly.
32-Bit Register Viewer

Click individual bits to set/clear them. Watch the hex value update. This is GPIOB_MODER — each pair of bits configures one pin's mode.

Why do we use BSRR instead of ODR for setting GPIO pins in interrupt-heavy code?

Chapter 3: ARM Cortex-M4 Assembly

Sometimes C isn't enough. When you need cycle-precise timing, when you're writing the first instructions that run at boot (the reset handler), or when you need to understand exactly what the compiler generated — you need assembly. The Cortex-M4 uses the Thumb-2 instruction set: a mix of 16-bit and 32-bit instructions that balances code density with performance.

Don't panic. ARM assembly is remarkably readable compared to x86. Most instructions do exactly one thing: load, store, add, compare, branch. No cryptic prefixes, no segment registers, no stack machine weirdness.

When you'll actually use assembly: (1) Startup code — the very first instructions after reset. (2) Critical ISRs where you need exact cycle counts. (3) DSP inner loops (MAC operations). (4) Context switching in an RTOS. (5) Reading compiler output to verify optimization.

The Register File

The Cortex-M4 has 16 general-purpose 32-bit registers:

RegisterNamePurpose
R0–R3Arguments / scratchFunction arguments, return value (R0), caller-saved
R4–R11Callee-savedPreserved across function calls, must be saved/restored
R12IP (Intra-Procedure)Scratch register, used by linker veneers
R13SP (Stack Pointer)Points to top of stack (two banks: MSP and PSP)
R14LR (Link Register)Return address for function calls (BL stores PC here)
R15PC (Program Counter)Address of next instruction to execute

Essential Instructions

arm
@ Data movement
MOV  R0, #42          @ R0 = 42 (immediate value)
MOV  R1, R0           @ R1 = R0 (register to register)
LDR  R0, [R1]         @ R0 = memory[R1] (load from address)
STR  R0, [R1]         @ memory[R1] = R0 (store to address)
LDR  R0, =0x48000418  @ R0 = 0x48000418 (load constant)

@ Arithmetic
ADD  R0, R1, R2       @ R0 = R1 + R2
SUB  R0, R1, #1      @ R0 = R1 - 1
MUL  R0, R1, R2       @ R0 = R1 * R2 (single cycle on M4!)

@ Bitwise
ORR  R0, R0, #(1<<14) @ Set bit 14
BIC  R0, R0, #(1<<14) @ Clear bit 14 (Bit Clear)
EOR  R0, R0, #(1<<14) @ Toggle bit 14 (XOR)

@ Compare and branch
CMP  R0, #0          @ Compare R0 with 0 (sets flags)
BEQ  label            @ Branch if equal (Z flag set)
BNE  label            @ Branch if not equal
BL   function         @ Branch with Link (call: saves PC to LR)
BX   LR               @ Branch to LR (return from function)

Worked Example: GPIO Toggle in Assembly

Let's write the PB14 toggle entirely in assembly. This sets pin 14 high using BSRR:

arm
@ Toggle PB14 via BSRR — 4 instructions, 4 cycles
toggle_led:
    LDR  R0, =0x48000418  @ R0 = address of GPIOB_BSRR
    LDR  R1, [R0, #-4]   @ R1 = GPIOB_ODR (0x414 = BSRR-4)
    EOR  R1, R1, #(1<<14) @ Toggle bit 14 in our copy
    STR  R1, [R0, #-4]   @ Write back to ODR
    BX   LR               @ Return

Now compare to what the C compiler generates from GPIOB_ODR ^= (1 << 14); at -O2 optimization:

arm
@ GCC -O2 output for GPIOB_ODR ^= (1 << 14)
    LDR  R3, =0x48000414  @ Load ODR address
    LDR  R2, [R3]          @ Read current ODR value
    EOR  R2, R2, #16384   @ XOR with (1<<14) = 16384
    STR  R2, [R3]          @ Write back
Good news: At -O2, GCC generates essentially optimal code — 4 instructions, same as our hand-written version. The compiler is your friend when you give it optimization flags. Assembly is for the rare cases where you need guarantees the compiler can't provide (interrupt-disabled sections, exact cycle timing, specific instruction ordering).

The Calling Convention (AAPCS)

When calling a function from C or from another assembly routine, ARM follows strict rules:

Arguments
R0, R1, R2, R3 (first 4 args). Additional args go on stack.
Return value
R0 (32-bit) or R0+R1 (64-bit)
Callee must save
R4–R11, LR (if calling another function)
Caller must save
R0–R3, R12 (if you need them after the call)
ARM Register File — Step Through Assembly

Watch registers change as each instruction executes. Orange = just modified. Click Step to advance.

Ready. Click Step to begin.
After executing BL myFunction, what does the LR (R14) register contain?

Chapter 4: Timer Configuration (Deep Dive)

Timers are the heartbeat of real-time systems. They generate periodic interrupts ("wake me up every 1ms"), measure external signal timing (input capture), and produce precise output waveforms (PWM). The STM32L475 has 16 timers. We'll focus on TIM2 — a 32-bit general-purpose timer clocked at up to 80MHz.

A timer is surprisingly simple at its core: it's just a counter that increments every clock tick. When the counter reaches a programmed value, it resets to zero and optionally fires an interrupt. That's it. The complexity comes from the many ways you can configure the clock source, counting direction, and output behavior.

The mental model: Think of a timer as a metronome. You set how fast it ticks (prescaler) and how many ticks until it "dings" (auto-reload value). When it dings, it can wake up your code, toggle a pin, or trigger another peripheral (like an ADC).

TIM2 Registers

TIM2 base address: 0x4000_0000. The key registers:

RegisterOffsetPurpose
CR10x00Control register 1 — enable timer, set counting mode
DIER0x0CDMA/Interrupt enable — which events generate interrupts
SR0x10Status register — which events have occurred (clear by writing 0)
CNT0x24Counter value — the actual 32-bit count
PSC0x28Prescaler — divides input clock by (PSC+1)
ARR0x2CAuto-reload — counter resets when it reaches this value

Worked Example: 1ms Periodic Interrupt

Goal: TIM2 fires an interrupt every 1ms (1kHz). The timer clock is 80MHz.

finterrupt = fclock / ((PSC + 1) × (ARR + 1))

We want finterrupt = 1000 Hz. So:

1000 = 80,000,000 / ((PSC + 1) × (ARR + 1))
(PSC + 1) × (ARR + 1) = 80,000

Choose PSC = 79, ARR = 999:

(79 + 1) × (999 + 1) = 80 × 1000 = 80,000 ✔

Verify: 80MHz / 80,000 = 1000 Hz = 1ms period. Perfect.

c
// Configure TIM2 for 1ms interrupt at 80MHz

// 1. Enable TIM2 clock (RCC APB1ENR1, bit 0)
*(volatile uint32_t*)0x40021058 |= (1 << 0);  // RCC_APB1ENR1 |= TIM2EN

// 2. Set prescaler: divide 80MHz by 80 → 1MHz tick
*(volatile uint32_t*)0x40000028 = 79;  // TIM2_PSC = 79

// 3. Set auto-reload: count 1000 ticks → 1ms
*(volatile uint32_t*)0x4000002C = 999;  // TIM2_ARR = 999

// 4. Enable update interrupt (DIER bit 0 = UIE)
*(volatile uint32_t*)0x4000000C |= (1 << 0);  // TIM2_DIER |= UIE

// 5. Enable timer (CR1 bit 0 = CEN)
*(volatile uint32_t*)0x40000000 |= (1 << 0);  // TIM2_CR1 |= CEN

// 6. Enable TIM2 interrupt in NVIC (IRQ #28)
*(volatile uint32_t*)0xE000E100 |= (1 << 28);  // NVIC_ISER0 bit 28
Why PSC=79 and ARR=999? We could also choose PSC=7999, ARR=9. Or PSC=0, ARR=79999. The interrupt rate is identical. But PSC=79 gives a nice 1MHz internal tick rate — each CNT increment = 1 microsecond. This makes debugging easier: read CNT and you directly know elapsed microseconds.

How the Counter Works

After enabling (CEN=1), the hardware does this in an infinite loop:

Clock tick
80MHz input divided by (PSC+1) = 1MHz
CNT++
Counter increments: 0, 1, 2, ... 999
CNT == ARR?
Counter reached 999?
↓ yes
Update event
Set UIF flag in SR, fire interrupt if UIE=1, reset CNT to 0
↻ repeat forever
Timer Counter Animation

Watch the counter increment, hit ARR, reset, and fire an interrupt. Adjust PSC and ARR to change the timing.

PSC 79
ARR 999
Freq: 1000.0 Hz (1.000ms period)
You need a 50Hz interrupt (20ms period) from TIM2 at 80MHz. With PSC=7999, what should ARR be?

Chapter 5: Interrupts & the NVIC

Your timer is counting. When it hits ARR, it needs to tell the CPU "hey, time's up!" It can't just wait for the CPU to check — that's polling, and polling wastes cycles. Instead, the timer sends an interrupt: an asynchronous hardware signal that forces the CPU to immediately stop what it's doing and jump to a handler function.

The NVIC (Nested Vectored Interrupt Controller) is the traffic cop. It receives interrupt requests from all 82 possible sources on the STM32L475 (timers, GPIO, UART, DMA, ADC...) and decides which one the CPU handles first, based on priority.

The key word is "nested": A higher-priority interrupt can interrupt a lower-priority interrupt handler that's already running. This is called preemption. If your 1kHz motor control ISR is running and a fault interrupt fires (higher priority), the motor ISR gets suspended mid-instruction, the fault handler runs to completion, then the motor ISR resumes. This is how you guarantee critical interrupts always get served immediately.

How Interrupts Work (Step by Step)

1. Peripheral raises IRQ
Timer UIF flag set → interrupt request to NVIC
2. NVIC checks priority
Is this IRQ higher priority than current execution? If yes → preempt.
3. Hardware stacks context
CPU pushes R0-R3, R12, LR, PC, xPSR to stack (8 registers, 12 cycles)
4. PC loads handler address
From vector table at (IRQ# + 16) × 4 bytes from table base
5. ISR executes
Your handler code runs. MUST clear the interrupt flag!
6. Return from ISR
BX LR with special EXC_RETURN value → hardware unstacks context

Interrupt Latency

On the Cortex-M4, the time from interrupt assertion to first ISR instruction is 12 cycles (150ns at 80MHz). This includes stacking 8 registers. The NVIC also supports tail-chaining: if another interrupt is pending when an ISR returns, the CPU skips the unstack/restack sequence and jumps directly to the next handler in just 6 cycles.

EventCyclesTime @ 80MHz
Interrupt entry (stacking + fetch)12150 ns
Interrupt return (unstacking)12150 ns
Tail-chain (back-to-back ISRs)675 ns
Late-arriving (higher priority during stack)0 extraRedirects immediately

Priority Configuration

The STM32L475 uses 4 bits for priority (values 0–15, where 0 is highest priority). These 4 bits are split into preemption priority and sub-priority using a configurable group setting. With the default grouping (4 bits preempt, 0 sub):

c
// Set TIM2 interrupt (IRQ #28) to priority 2
// NVIC_IPR registers at 0xE000E400, one byte per IRQ
// Priority in top 4 bits of the byte
*(volatile uint8_t*)(0xE000E400 + 28) = (2 << 4);

// Set EXTI0 (IRQ #6) to priority 1 (higher than TIM2)
*(volatile uint8_t*)(0xE000E400 + 6) = (1 << 4);

Worked Example: External Interrupt on PA0 (Falling Edge)

c
// Configure EXTI0 for falling edge on PA0 (user button)

// 1. Enable GPIOA clock
*(volatile uint32_t*)0x4002104C |= (1 << 0);  // RCC_AHB2ENR bit 0

// 2. Enable SYSCFG clock (needed for EXTI mux)
*(volatile uint32_t*)0x40021060 |= (1 << 0);  // RCC_APB2ENR bit 0

// 3. Map EXTI0 to PA0 (SYSCFG_EXTICR1, bits [3:0] = 0000 = Port A)
*(volatile uint32_t*)0x40010008 &= ~0xF;  // SYSCFG_EXTICR1 bits[3:0] = PA

// 4. Configure falling edge trigger (EXTI_FTSR1 bit 0)
*(volatile uint32_t*)0x4001000C |= (1 << 0);  // EXTI_FTSR1

// 5. Unmask EXTI0 (EXTI_IMR1 bit 0)
*(volatile uint32_t*)0x40010000 |= (1 << 0);  // EXTI_IMR1

// 6. Enable EXTI0 in NVIC (IRQ #6)
*(volatile uint32_t*)0xE000E100 |= (1 << 6);  // NVIC_ISER0

// 7. ISR handler (name must match vector table)
void EXTI0_IRQHandler(void) {
    // Clear pending bit (write 1 to clear!)
    *(volatile uint32_t*)0x40010014 = (1 << 0);  // EXTI_PR1
    // Do something (toggle LED, set flag, etc.)
    GPIOB_ODR ^= (1 << 14);
}
Critical detail: You MUST clear the interrupt flag (EXTI_PR1 for external interrupts, TIM2_SR for timers) inside the ISR. If you forget, the interrupt fires again immediately after returning, creating an infinite loop that locks up your system.
NVIC Priority & Preemption

Multiple interrupts fire at different times. Watch how the NVIC handles preemption and tail-chaining. Lower number = higher priority.

TIM2 ISR (priority 3) is running. EXTI0 fires with priority 1. What happens?

Chapter 6: ISR Design Patterns

Here's the golden rule of interrupt service routines: get in, do the minimum, get out. Every cycle you spend inside an ISR is a cycle where lower-priority interrupts are blocked. A long ISR doesn't just slow down your system — it can cause other interrupts to miss their deadlines.

What's "the minimum"? Set a flag. Copy one byte to a buffer. Start a DMA transfer. That's it. Never: allocate memory, call printf, do floating-point math, or loop over arrays inside an ISR.

The rule of thumb: Your ISR should take less than 10% of the interrupt period. For a 1kHz interrupt (1ms period), the ISR should complete in under 100µs. For a 10kHz interrupt (100µs period), under 10µs. Violate this and you're headed for missed deadlines.

Pattern 1: Flag-Based (Simplest)

The ISR sets a volatile flag. The main loop checks and clears it. The word volatile tells the compiler "this variable can change at any time outside normal program flow — never optimize away reads of it."

c
volatile uint8_t timer_flag = 0;

void TIM2_IRQHandler(void) {
    TIM2_SR &= ~(1 << 0);  // Clear UIF (2 cycles)
    timer_flag = 1;         // Set flag (1 cycle)
}  // Total ISR: ~5 cycles = 62.5ns

int main(void) {
    // ... setup ...
    while(1) {
        if(timer_flag) {
            timer_flag = 0;
            // Do the heavy processing here (main context)
            process_sensor_data();
            update_display();
        }
    }
}

Pattern 2: Ring Buffer (For Streaming Data)

When data arrives byte-by-byte (UART, SPI), you need a buffer. A ring buffer (circular buffer) lets the ISR write and the main loop read without blocking each other, as long as the buffer doesn't overflow.

c
#define BUF_SIZE 64  // Must be power of 2 for fast modulo
volatile uint8_t buf[BUF_SIZE];
volatile uint8_t head = 0;  // ISR writes here
volatile uint8_t tail = 0;  // Main reads here

void USART1_IRQHandler(void) {
    uint8_t byte = USART1_RDR;           // Read received byte
    buf[head] = byte;                     // Store in buffer
    head = (head + 1) & (BUF_SIZE - 1);  // Advance head (wraps)
}

// Main loop reads when data available
while(tail != head) {
    uint8_t data = buf[tail];
    tail = (tail + 1) & (BUF_SIZE - 1);
    process(data);
}
Why power-of-2 buffer size? The modulo operation head % BUF_SIZE is slow (division). But head & (BUF_SIZE - 1) is a single AND instruction when BUF_SIZE is a power of 2. In an ISR, every cycle counts.

Pattern 3: Double Buffer (For Block Processing)

When you're sampling at high rates (e.g., audio at 48kHz), you can't process each sample individually. Instead: fill one buffer while processing the other, then swap. The ISR fills Buffer A, triggers a "buffer full" flag, main processes A while ISR fills B, repeat.

c
#define BLOCK_SIZE 256
volatile int16_t bufA[BLOCK_SIZE];
volatile int16_t bufB[BLOCK_SIZE];
volatile int16_t* fill_buf = bufA;   // ISR writes here
volatile int16_t* proc_buf = bufB;   // Main reads here
volatile uint16_t fill_idx = 0;
volatile uint8_t buffer_ready = 0;

void ADC1_IRQHandler(void) {
    fill_buf[fill_idx++] = ADC1_DR;   // Read sample
    if(fill_idx >= BLOCK_SIZE) {
        fill_idx = 0;
        // Swap buffers
        volatile int16_t* tmp = fill_buf;
        fill_buf = proc_buf;
        proc_buf = tmp;
        buffer_ready = 1;
    }
}

What NEVER To Do in an ISR

Anti-patternWhy It's FatalCorrect Alternative
printf()Calls malloc, UART waits, 1000+ cyclesSet flag, print in main
malloc()/free()Non-deterministic time, can fragmentPre-allocate all buffers
for(i=0; i<1000;...)Blocks all lower-priority interruptsProcess one element per ISR call
Float mathFPU context save adds 17 cycles entryUse fixed-point in ISR, float in main
Forget to clear flagISR re-enters immediately = system hangALWAYS clear flag first thing
ISR Timing: Short vs Long

Compare a well-designed ISR (flag + deferred processing) vs a bloated ISR (all processing inline). Watch how the long ISR blocks subsequent interrupts.

Your ADC ISR currently takes 80µs. It fires every 100µs (10kHz). What percentage of CPU time does it consume?

Chapter 7: Timer + Interrupt System (Showcase)

Time to put it all together. We'll build a complete real-time data acquisition system: three timers running at different rates, an ADC sampling sensor data, DMA transferring results, and a main loop that processes data when buffers are full. This is exactly how a real embedded sensor node works.

The system: TIM2 @ 1kHz triggers ADC1 to sample a sensor. DMA1 transfers ADC results into a 256-sample buffer. When the buffer fills, main loop processes it (FFT, averaging, whatever). Meanwhile, TIM3 @ 10Hz updates a display, and TIM7 @ 1Hz blinks a heartbeat LED. All running concurrently via interrupts.

System Architecture

TimerRatePSCARRPurposePriority
TIM21 kHz79999ADC trigger (sampling)1 (highest)
TIM310 Hz7999999Display update3
TIM71 Hz79999999Heartbeat LED7 (lowest)

Verification: TIM3 = 80MHz / (8000 × 1000) = 10Hz. TIM7 = 80MHz / (8000 × 10000) = 1Hz. Correct.

Priority Assignment Rationale

The ADC sampling MUST happen exactly on time (hard real-time) — a jittered sample corrupts the frequency content of the signal. Display update can slip a few ms without anyone noticing (soft real-time). The heartbeat LED is purely cosmetic. Priority reflects criticality.

Complete System Code

c
// Complete real-time data acquisition system
// STM32L475 @ 80MHz, bare metal

#define BLOCK_SIZE 256
volatile uint16_t adc_bufA[BLOCK_SIZE];
volatile uint16_t adc_bufB[BLOCK_SIZE];
volatile uint16_t* adc_fill = adc_bufA;
volatile uint16_t* adc_proc = adc_bufB;
volatile uint16_t adc_idx = 0;
volatile uint8_t data_ready = 0;
volatile uint8_t display_flag = 0;
volatile uint32_t sample_count = 0;

// TIM2 ISR: 1kHz ADC sampling (priority 1)
void TIM2_IRQHandler(void) {
    TIM2_SR = 0;  // Clear ALL flags (fast: single write)
    ADC1_CR |= (1 << 2);  // Start ADC conversion (ADSTART)
}

// ADC ISR: conversion complete
void ADC1_IRQHandler(void) {
    adc_fill[adc_idx] = ADC1_DR;  // Read clears EOC flag
    adc_idx++;
    sample_count++;
    if(adc_idx >= BLOCK_SIZE) {
        adc_idx = 0;
        volatile uint16_t* tmp = adc_fill;
        adc_fill = adc_proc;
        adc_proc = tmp;
        data_ready = 1;
    }
}

// TIM3 ISR: 10Hz display update (priority 3)
void TIM3_IRQHandler(void) {
    TIM3_SR = 0;
    display_flag = 1;
}

// TIM7 ISR: 1Hz heartbeat LED (priority 7)
void TIM7_IRQHandler(void) {
    TIM7_SR = 0;
    GPIOB_ODR ^= (1 << 14);  // Toggle LED
}

int main(void) {
    setup_clocks();   // 80MHz from PLL
    setup_gpio();     // PB14 output
    setup_adc();      // ADC1 channel, 12-bit
    setup_tim2();     // 1kHz
    setup_tim3();     // 10Hz
    setup_tim7();     // 1Hz

    while(1) {
        if(data_ready) {
            data_ready = 0;
            process_block(adc_proc, BLOCK_SIZE);
        }
        if(display_flag) {
            display_flag = 0;
            update_display(sample_count);
        }
        __WFI();  // Sleep until next interrupt
    }
}

Timing Budget

ISRCyclesTime% of Period
TIM2 (1kHz)~10125ns0.013%
ADC1 (1kHz)~20250ns0.025%
TIM3 (10Hz)~8100ns<0.001%
TIM7 (1Hz)~10125ns<0.001%
Total overhead<0.05%

The CPU spends 99.95% of its time either sleeping (WFI) or in the main loop processing data. The interrupt overhead is negligible because we followed the "short ISR" pattern.

Real-Time Data Acquisition System

Full system simulation. Three timers fire at different rates. ADC samples fill a buffer. Main loop processes when full. Adjust timer periods and watch the system respond. Red flash = deadline miss.

ADC Rate 1000 Hz
Processing Time 50 ms
Block Size 256
System idle. Press Start.

Chapter 8: Low-Power Real-Time

Here's a paradox: real-time systems must respond instantly, but many of them run on batteries. A sensor node that wakes up every second to read temperature, then sleeps for 999ms, can run for years on a coin cell. The trick is the WFI (Wait For Interrupt) instruction: it halts the CPU until the next interrupt fires. Zero power consumption while waiting, instant wake-up when needed.

The STM32L475 is designed for exactly this use case. Its STOP2 mode draws only 1.1 microamps while keeping SRAM and registers alive. Wake-up time from STOP2 is about 3.3µs — fast enough for most applications.

The insight: Real-time doesn't mean "always running." It means "responding within the deadline WHEN something happens." Between events, the CPU should be asleep. A well-designed embedded system is asleep 99%+ of the time.

Power Modes on STM32L475

ModeCurrentWake-up TimeWhat's PreservedWake Sources
Run100 µA/MHz-Everything-
Sleep~25 µA/MHz~1 µsAll, CPU haltedAny interrupt
STOP21.1 µA3.3 µsSRAM, regs, RTCEXTI, RTC, LPTIM
Standby0.3 µA50 µsBackup regs onlyWKUP pins, RTC
Shutdown0.03 µA~msNothingWKUP pins

WFI: The Simplest Power Savings

c
// Simple sleep-between-interrupts pattern
// CPU runs at 80MHz only during ISR + main processing
// Sleeps at ~2mA the rest of the time

int main(void) {
    setup_all_peripherals();

    while(1) {
        if(data_ready) {
            data_ready = 0;
            process_data();      // Runs at 80MHz, takes ~5ms
        }
        __WFI();  // ARM instruction: Wait For Interrupt
        // CPU sleeps here until ANY enabled interrupt fires
        // Wake-up is instant (1 cycle latency)
    }
}

Tickless Idle: Maximum Power Savings

The standard approach uses SysTick (a periodic 1ms interrupt) for timekeeping. But SysTick wakes the CPU 1000 times per second even when there's nothing to do. Tickless idle stops SysTick entirely and programs the RTC alarm for the next scheduled event:

c
// Tickless idle: sleep until next event, not next tick

uint32_t next_event_ms = get_next_scheduled_time();
uint32_t sleep_duration = next_event_ms - current_time_ms;

// Disable SysTick
SYSTICK_CTRL &= ~(1 << 0);

// Program RTC wake-up timer
RTC_WUTR = sleep_duration;  // Wake in sleep_duration ms
RTC_CR |= (1 << 10);       // Enable wake-up timer

// Enter STOP2 mode
PWR_CR1 |= (1 << 0);  // LPMS = STOP2
SCB_SCR |= (1 << 2);  // SLEEPDEEP = 1
__WFI();

// --- wake up here ---
// Restore clocks (STOP2 resets to MSI 4MHz)
restore_80mhz_clock();
// Update time accounting
current_time_ms += actual_sleep_duration();
// Re-enable SysTick if needed
SYSTICK_CTRL |= (1 << 0);

Power Budget Calculation

A sensor node that samples at 1Hz, processes for 5ms, then sleeps in STOP2:

Iavg = (Iactive × tactive + Isleep × tsleep) / Tperiod
Iavg = (8mA × 5ms + 1.1µA × 995ms) / 1000ms
Iavg = (40µA·s + 1.095µA·s) / 1s = 41.1 µA

With a CR2032 coin cell (225 mAh):

Battery life = 225mAh / 0.0411mA = 5,474 hours = 228 days

Compare to always-on at 80MHz (8mA): 225mAh / 8mA = 28 hours. Sleep mode gives you 195x improvement.

Power Timeline Visualization

Active bursts (high current) separated by sleep periods. Adjust the wake rate and processing time to see the effect on battery life.

Wake rate (Hz) 1.0 Hz
Active time (ms) 5 ms
Avg: 41.1 µA | Battery: 228 days (CR2032)
Your sensor node wakes every 100ms (10Hz) for 2ms of processing. What's the approximate average current if active current is 8mA and STOP2 is 1.1µA?

Chapter 9: Mastery & Connections

You now understand the full stack: from real-time deadlines down to individual register bits. Let's consolidate with reference material, then look at where this knowledge leads.

Timer Configuration Cheat Sheet

Desired RatePSC (80MHz clock)ARRTimer Tick
1 Hz79999999100µs
10 Hz7999999100µs
100 Hz79999910µs
1 kHz799991µs
10 kHz7999100ns
100 kHz079912.5ns
1 MHz (PWM)07912.5ns

Interrupt Latency Reference

ScenarioCyclesTime @ 80MHz
Normal entry12150 ns
Tail-chain675 ns
Late-arriving (during stacking)0 extraRedirects
Return from ISR12150 ns
Wake from Sleep mode12 + 1~162 ns
Wake from STOP2~264 cycles3.3 µs

NVIC Priority Assignment Strategy

Rule of thumb for priority assignment:
Priority 0-1: Safety-critical (fault handlers, watchdog, emergency stop)
Priority 2-3: Hard real-time (motor control, ADC sampling, communication timeouts)
Priority 4-7: Firm real-time (display update, data logging, LED feedback)
Priority 8-15: Soft real-time (background housekeeping, statistics, debug output)

ARM Assembly Quick Reference

InstructionCyclesEffect
MOV Rd, #imm1Rd = immediate
LDR Rd, [Rn]2Rd = mem[Rn]
STR Rd, [Rn]2mem[Rn] = Rd
ADD/SUB1Arithmetic
MUL132-bit multiply (Cortex-M4)
B/BEQ/BNE1-3Branch (pipeline flush)
BL1+Call (saves return addr in LR)
PUSH/POP1+NStack N registers
WFI1+Sleep until interrupt

Design Challenge: Real-Time Motor Controller

Design a system with these requirements:

Key decisions: TIM4 @ 10kHz triggers PID ISR (priority 1). TIM1 counts encoder pulses continuously (no ISR needed — just read CNT). TIM2 generates PWM (no ISR needed — just update CCR1). A watchdog timer fires at 2kHz; if PID hasn't cleared its flag, kill the motor. UART telemetry uses DMA with ring buffer at priority 5.

PID update: PSC=7, ARR=999 → 80MHz / 8 / 1000 = 10kHz ✔
PWM frequency: PSC=0, ARR=3999 → 80MHz / 4000 = 20kHz ✔
PWM duty cycle: CCR1 = PID_output × 3999 / max_output

Comparison: Bare Metal vs RTOS

AspectBare Metal (this lesson)RTOS (FreeRTOS, Zephyr)
Timing controlExact cycle countsTick-based (typically 1ms granularity)
Code sizeMinimal (your code only)+10-50KB kernel
ComplexitySimple for <5 tasksBetter for 10+ tasks
Worst-case latencyPredictable (you control everything)Kernel overhead adds ~5-20µs
Concurrency modelInterrupts + main loopThreads + semaphores + queues
When to useSimple systems, maximum performanceComplex systems, many tasks

Where to Go Next

"Premature optimization is the root of all evil, but we should never miss the critical 3%." — Donald Knuth. In embedded real-time systems, those 3% are the ISRs. Get them right, and the whole system works. Get them wrong, and nothing else matters.
You're designing a system with a 10kHz PID loop, 1kHz data logging, and 10Hz display update. Which timer should have the highest NVIC priority?