The invisible computers running your car, your microwave, your pacemaker — and how to program them from scratch.
Your microwave has a computer inside it. So does your car — over 100 of them. Your thermostat, your washing machine, your electric toothbrush, your elevator. These aren't laptops. They're purpose-built computers with ONE job, running forever, with no operating system to save them if something crashes.
A laptop can run Photoshop, play games, browse the web. An embedded system does one thing: read a temperature sensor and control a heater. Or fire an airbag within 10 milliseconds of impact. Or keep a pacemaker's rhythm at exactly 72 BPM while consuming less power than a watch battery.
What makes embedded different from your PC?
| Constraint | Your Laptop | Embedded System |
|---|---|---|
| RAM | 16 GB | 2 KB – 512 KB |
| Clock | 3+ GHz | 8 – 168 MHz |
| Power | 65 W | 0.001 – 1 W |
| Cost target | $1000+ | $0.50 – $10 |
| Latency | "Fast enough" | Microsecond deadline |
| OS | Windows/macOS/Linux | None (bare metal) or RTOS |
These constraints aren't annoyances — they're design requirements. A pacemaker that needs a wall outlet kills the patient. A $50 microcontroller in every car door sensor makes the car unaffordable. An airbag that fires 200ms late is useless.
Click a device to see its real-world constraints. Notice how wildly they differ — yet all are "embedded systems."
You've heard "microcontroller" and "microprocessor" used interchangeably. They're not the same thing. The difference is like a Swiss Army knife vs a chef's knife — one is self-contained, the other needs a kitchen.
A microprocessor (MPU) is just a CPU. It's powerful, but it needs external RAM, external flash storage, external peripherals — all connected on a circuit board. Think: Intel Core i7, ARM Cortex-A72 (Raspberry Pi). These live in phones, PCs, and servers.
A microcontroller (MCU) puts EVERYTHING on one chip: the CPU, the RAM, the flash memory for your program, the ADC for reading sensors, the UART for serial communication, the timers, the GPIO pins. One chip, one package, done. Think: STM32, ESP32, ATmega328P (Arduino).
| Feature | MCU (e.g., STM32F103) | MPU (e.g., Cortex-A72) |
|---|---|---|
| CPU + Memory | All on-chip | CPU only, external RAM |
| RAM | 20 KB SRAM | 1–8 GB DDR4 |
| Clock | 72 MHz | 1.5 GHz |
| Flash | 64–512 KB on-chip | SD card / eMMC (external) |
| Peripherals | Built-in ADC, UART, SPI, I²C, timers | Needs external chips |
| Cost | $1–$10 | $10–$100+ |
| Power | 10–100 mW | 1–5 W |
| OS | Bare-metal or RTOS | Linux, Android |
| Boot time | <1 ms | 5–30 seconds |
| Use case | Motor control, sensors, IoT | GUI, networking, ML |
The classic comparison: Arduino Uno (ATmega328P: 2 KB RAM, 32 KB Flash, 16 MHz, $3) vs Raspberry Pi 4 (Cortex-A72: 4 GB RAM, 64 GB SD, 1.5 GHz, $55). The Arduino runs one C program forever. The Pi runs Linux with a full desktop.
Click each block to learn what it does. Everything lives on ONE silicon die.
Here's the single most important concept in embedded programming: everything is a memory address. That LED on pin 13? It's controlled by a single bit at address 0x4001_100C. That timer counting microseconds? It's a 16-bit number at address 0x4001_2424. Want to send a byte over UART? Write it to address 0x4001_3804.
A register in embedded programming is not a CPU register (like R0-R15). It's a specific memory address that has hardware attached to it. Writing to that address doesn't just store a number — it makes physical things happen. This is called memory-mapped I/O.
Here's a simplified memory map for an STM32F103 (a popular ARM Cortex-M3 MCU):
| Address Range | What Lives There | Size |
|---|---|---|
0x0800_0000 | Flash — your compiled code | 64–512 KB |
0x2000_0000 | SRAM — variables, stack | 20 KB |
0x4001_0800 | GPIOA registers | 28 bytes |
0x4001_0C00 | GPIOB registers | 28 bytes |
0x4001_2C00 | TIM1 (timer 1) registers | 84 bytes |
0x4001_3800 | USART1 registers | 28 bytes |
0x4001_2400 | ADC1 registers | 80 bytes |
0xE000_E100 | NVIC (interrupt controller) | varies |
Each peripheral has a base address, and its individual control registers are at fixed offsets from that base. For example, GPIOA's base is 0x4001_0800:
| Offset | Register | Purpose |
|---|---|---|
+0x00 | CRL | Configure pins 0–7 mode |
+0x04 | CRH | Configure pins 8–15 mode |
+0x08 | IDR | Read all 16 pins (input data) |
+0x0C | ODR | Set all 16 pins (output data) |
+0x10 | BSRR | Set/reset individual pins atomically |
So to read the state of GPIOA pin 5, you read address 0x4001_0808 (base + IDR offset) and check bit 5. In C:
c // Read pin PA5 (is the button pressed?) volatile uint32_t *GPIOA_IDR = (uint32_t *)0x40010808; uint8_t pin5 = (*GPIOA_IDR >> 5) & 1; // Extract bit 5 // Set pin PA5 HIGH (turn on LED) volatile uint32_t *GPIOA_ODR = (uint32_t *)0x4001080C; *GPIOA_ODR |= (1 << 5); // Set bit 5 = 1
volatile? Without it, the compiler might "optimize away" your read — it thinks the value can't change between reads. But hardware CAN change a register at any time (a button press, a timer tick). volatile forces the CPU to actually read the address every time.Click an address region to see what peripheral lives there. "Write" a value to a register and watch the hardware respond.
0x4001_0C00 and ODR is at offset +0x0C, what address do you write to set GPIOB output pins?General Purpose Input/Output (GPIO) is the simplest peripheral, and the one you'll use most. Each GPIO pin is a physical metal pad on the chip that connects to the outside world. You configure each pin to be one of several modes:
| Mode | What It Does | Example |
|---|---|---|
| Input | Read external voltage (HIGH/LOW) | Button, switch, digital sensor |
| Output | Drive voltage out (3.3V or 0V) | LED, relay, motor driver enable |
| Alternate Function | Pin is controlled by a peripheral | UART TX, SPI MOSI, PWM output |
| Analog | Read continuous voltage (0–3.3V) | Temperature sensor, potentiometer |
Let's walk through the exact register writes to blink an LED on PA5 (Port A, Pin 5) of an STM32F103. This is the "Hello World" of embedded.
c // Blink LED on PA5 — bare metal STM32F103 #include "stm32f1xx.h" void delay(volatile uint32_t count) { while(count--); } int main(void) { // 1. Enable GPIOA clock (bit 2 of APB2ENR) RCC->APB2ENR |= (1 << 2); // 2. Configure PA5 as output, push-pull, 50MHz // CRL controls pins 0-7, each pin uses 4 bits // Pin 5: bits [23:20] = 0b0011 (output 50MHz push-pull) GPIOA->CRL &= ~(0xF << 20); // Clear pin 5 config GPIOA->CRL |= (0x3 << 20); // Set output push-pull 50MHz while(1) { GPIOA->ODR |= (1 << 5); // LED ON delay(500000); GPIOA->ODR &= ~(1 << 5); // LED OFF delay(500000); } }
For reading a button on PA0 (input with internal pull-up):
c // Read button on PA0 (active low — pressed = 0) GPIOA->CRL &= ~(0xF << 0); // Clear pin 0 config GPIOA->CRL |= (0x8 << 0); // Input with pull-up/pull-down GPIOA->ODR |= (1 << 0); // Select pull-UP (ODR bit activates pullup) // In main loop: if (!(GPIOA->IDR & (1 << 0))) { // Button is pressed (pin reads LOW) }
Write to registers to control the LED and read the button. This is exactly what happens inside the chip.
A timer is the most versatile peripheral on any MCU. At its core, it's stupidly simple: a counter register that increments every clock cycle. When it reaches a target value, it overflows (wraps to zero) and can trigger an event. That's it. But from this simple mechanism, you get:
| Application | How It Works |
|---|---|
| Periodic interrupts | Timer overflows every 1ms → ISR runs → sample sensor |
| PWM output | Timer sets pin HIGH at start, LOW at compare → variable duty cycle |
| Input capture | Timer records its count when an external edge arrives → measure pulse width |
| Delay generation | Start timer, wait for overflow flag, stop → precise blocking delay |
Let's work through the math for a 1 kHz periodic interrupt (fires every 1ms). Our STM32F103 runs at 72 MHz. The timer has two controls:
Where PSC (prescaler) divides the clock before it reaches the counter, and ARR (auto-reload register) is the value the counter counts up to before overflowing.
Worked example: We want foverflow = 1000 Hz from fclock = 72,000,000 Hz.
Many valid combinations. Common choice: PSC = 71 (divide clock by 72 → 1 MHz tick), ARR = 999 (count 0 to 999 = 1000 ticks). Verify: 72MHz / 72 / 1000 = 1000 Hz. Done.
c // Configure TIM2 for 1kHz overflow interrupt RCC->APB1ENR |= (1 << 0); // Enable TIM2 clock TIM2->PSC = 71; // Prescaler: 72MHz / (71+1) = 1MHz TIM2->ARR = 999; // Auto-reload: count to 999 = 1000 ticks TIM2->DIER |= (1 << 0); // Enable update interrupt (UIE bit) TIM2->CR1 |= (1 << 0); // Start counter (CEN bit) // In NVIC: enable TIM2 interrupt (IRQ #28) NVIC->ISER[0] |= (1 << 28);
Watch the counter count up. Adjust prescaler and ARR to change overflow frequency. Enable PWM mode to see the output waveform.
Without interrupts, your CPU would have to constantly poll every peripheral: "Is the button pressed? No. Is the button pressed? No. Is the button pressed? No. Is UART data ready? No." This wastes 99.99% of CPU time checking things that haven't happened.
Interrupts flip this model. The hardware watches for events. When an event occurs (button press, timer overflow, byte received), it yanks the CPU out of whatever it's doing, forces it to run a specific function (the Interrupt Service Routine, or ISR), then returns to the original code as if nothing happened.
Here's the sequence when an interrupt fires:
The NVIC (Nested Vectored Interrupt Controller) is the hardware that manages all of this. "Nested" means a higher-priority interrupt can interrupt a lower-priority ISR. "Vectored" means each interrupt source has its own entry in a table (no polling to figure out which interrupt fired).
c // Timer 2 ISR — called every 1ms (from our Ch.4 config) void TIM2_IRQHandler(void) { if (TIM2->SR & (1 << 0)) { // Check update interrupt flag TIM2->SR &= ~(1 << 0); // CLEAR the flag (critical!) milliseconds++; // Increment global counter } } // External interrupt on PA0 (button press) void EXTI0_IRQHandler(void) { if (EXTI->PR & (1 << 0)) { // Check pending bit EXTI->PR |= (1 << 0); // Clear by writing 1 button_pressed = 1; // Set flag for main loop } }
volatile. (4) Always clear the interrupt flag.Watch main code execute (green). Click "Fire Interrupt" to see context save, ISR execution (orange), and context restore in real time.
Time to put it all together. We'll build a complete embedded system from scratch: a digital thermometer that samples temperature every 100ms (timer interrupt), displays the reading on a 7-segment display (GPIO output), and responds to a button press (external interrupt) to switch between °C and °F.
This uses every concept from Chapters 0–5:
| Component | Chapter | Mechanism |
|---|---|---|
| Temperature sensor → ADC | Ch 2 (Registers) | Read analog voltage at memory-mapped ADC register |
| 100ms sampling | Ch 4 (Timers) | TIM2 overflow at 10 Hz triggers ISR |
| ISR reads ADC | Ch 5 (Interrupts) | Timer ISR starts ADC conversion, stores result |
| 7-segment display | Ch 3 (GPIO) | 7 output pins drive segment LEDs |
| °C/°F button | Ch 5 (Interrupts) | External interrupt on PA0 toggles unit flag |
c // Digital thermometer — complete bare-metal implementation volatile uint16_t adc_value = 0; volatile uint8_t new_sample = 0; volatile uint8_t use_fahrenheit = 0; // TIM2 ISR — fires every 100ms (PSC=7199, ARR=999 at 72MHz) void TIM2_IRQHandler(void) { TIM2->SR &= ~(1 << 0); // Clear interrupt flag ADC1->CR2 |= (1 << 0); // Start ADC conversion while(!(ADC1->SR & (1<<1))); // Wait for EOC (end of conversion) adc_value = ADC1->DR; // Read 12-bit result (0-4095) new_sample = 1; // Signal main loop } // EXTI0 ISR — button press toggles °C/°F void EXTI0_IRQHandler(void) { EXTI->PR |= (1 << 0); // Clear pending bit use_fahrenheit ^= 1; // Toggle unit } int main(void) { setup_clocks(); setup_gpio(); setup_adc(); setup_timer(); setup_exti(); while(1) { if (new_sample) { new_sample = 0; float temp_c = (adc_value * 3.3 / 4095.0 - 0.5) * 100.0; float display_temp = use_fahrenheit ? temp_c*9.0/5.0+32.0 : temp_c; update_7seg((int)display_temp); } } }
Watch the complete system: timer counts, interrupt fires, ADC reads, display updates. Press the button to toggle °C/°F. Drag the temperature slider.
Your MCU lives on a board with sensors, displays, memory chips, and other MCUs. They need to talk. Three protocols dominate the embedded world:
| Protocol | Wires | Speed | Addressing | Best For |
|---|---|---|---|---|
| UART | 2 (TX, RX) | 9600–921600 baud | Point-to-point | Debug console, GPS, Bluetooth |
| SPI | 4 (CLK, MOSI, MISO, CS) | 1–50+ MHz | Chip select line | Displays, SD cards, fast sensors |
| I²C | 2 (SCL, SDA) | 100–400 kHz | 7-bit address | Sensors, EEPROMs, multi-device |
Asynchronous means no clock line — both sides agree on a baud rate beforehand. Data format: 1 start bit (LOW) + 8 data bits (LSB first) + 1 stop bit (HIGH). Total: 10 bits per byte. At 9600 baud: 9600 bits/second ÷ 10 bits/byte = 960 bytes/second.
c // Send one byte over USART1 void uart_send(uint8_t byte) { while(!(USART1->SR & (1<<7))); // Wait until TX empty (TXE bit) USART1->DR = byte; // Write byte to data register } // Receive one byte uint8_t uart_recv(void) { while(!(USART1->SR & (1<<5))); // Wait until RX not empty (RXNE bit) return USART1->DR; // Read received byte }
Synchronous (has a clock line). Master controls CLK. Data shifts out on MOSI (Master Out Slave In) and simultaneously in on MISO (Master In Slave Out). Full duplex. CS (Chip Select) line goes LOW to select a specific slave. Very fast (50+ MHz) but requires 4 wires + one CS per slave device.
Only 2 wires (SCL clock + SDA data), supports up to 127 devices on the same bus. Each device has a unique 7-bit address. Master sends address first, the matching slave responds. Slower than SPI (100–400 kHz) but uses minimal pins. Perfect for multiple sensors on one bus.
Select a protocol and watch byte 0x42 ('B') transmitted bit by bit on the wire(s).
You've mastered the hardware. Now: how do you structure the software that runs on it? There are three dominant patterns in bare-metal embedded, each with clear trade-offs:
The simplest possible architecture. One infinite loop that checks everything sequentially:
c while(1) { read_sensors(); // Poll ADC process_data(); // Convert, filter update_display(); // Write to 7-seg check_buttons(); // Poll GPIO inputs handle_comms(); // Poll UART RX }
Pros: Dead simple, easy to debug, deterministic order. Cons: Wastes CPU time polling things that haven't changed. Can't respond to events faster than loop time. If one function blocks (slow sensor read), everything else waits.
ISRs handle time-critical events immediately. Main loop processes flags:
c volatile uint8_t flags = 0; #define FLAG_TIMER (1<<0) #define FLAG_BUTTON (1<<1) #define FLAG_UART_RX (1<<2) void TIM2_IRQHandler() { TIM2->SR=0; flags|=FLAG_TIMER; } void EXTI0_IRQHandler(){ EXTI->PR=1; flags|=FLAG_BUTTON; } while(1) { if(flags & FLAG_TIMER) { flags&=~FLAG_TIMER; sample_sensor(); } if(flags & FLAG_BUTTON) { flags&=~FLAG_BUTTON; toggle_unit(); } if(flags & FLAG_UART_RX){ flags&=~FLAG_UART_RX;process_cmd(); } __WFI(); // Sleep until next interrupt (saves power!) }
Pros: Instant response to events, CPU sleeps when idle (low power), no wasted polling. Cons: Race conditions between ISR and main, priority inversion bugs, harder to reason about.
The system is in one state at any time. Events cause transitions to other states. Each state has defined entry/exit/loop actions:
c typedef enum { IDLE, HEATING, COOLING, ALARM } State; typedef enum { EVT_TEMP_HIGH, EVT_TEMP_LOW, EVT_TEMP_OK, EVT_FAULT } Event; State current = IDLE; void handle_event(Event e) { switch(current) { case IDLE: if(e == EVT_TEMP_LOW) { heater_on(); current = HEATING; } if(e == EVT_TEMP_HIGH) { cooler_on(); current = COOLING; } if(e == EVT_FAULT) { alarm_on(); current = ALARM; } break; case HEATING: if(e == EVT_TEMP_OK) { heater_off(); current = IDLE; } if(e == EVT_FAULT) { heater_off(); alarm_on(); current = ALARM; } break; // ... COOLING, ALARM cases } }
Pros: Extremely debuggable (print current state), impossible to be in two states at once, easy to extend. Cons: Boilerplate for simple systems, state explosion for complex ones.
A thermostat state machine. Generate events and watch the system transition between states.
You now understand the fundamental building blocks of embedded systems. Every device — from a $0.50 sensor node to a $50 drone flight controller — is built from these same primitives: GPIO, timers, interrupts, communication protocols, and software patterns.
| Peripheral | Key Registers | What They Do |
|---|---|---|
| GPIO | CRL/CRH, IDR, ODR, BSRR | Configure mode, read inputs, set outputs |
| Timer | CR1, PSC, ARR, CCR1, SR, DIER | Control, prescaler, reload, compare, status, interrupt enable |
| NVIC | ISER, ICER, ISPR, IPR | Enable, disable, set-pending, priority |
| USART | CR1, BRR, SR, DR | Control, baud rate, status, data |
| RCC | APB1ENR, APB2ENR, CFGR | Peripheral clock enable, clock config |
| Priority | Source | Rationale |
|---|---|---|
| 0 (highest) | Safety-critical (fault, motor shutoff) | Must never be delayed |
| 1 | Timing-critical (high-freq PWM, encoder) | Jitter = position error |
| 2 | Periodic sampling (ADC, sensor read) | Late sample = filter error |
| 3 | Communication (UART RX, SPI) | Buffer absorbs short delays |
| 4 (lowest) | UI (button, display update) | Humans can't tell 1ms vs 5ms |
The MCU has a clock tree that distributes timing from an oscillator (8 MHz crystal) through PLLs and dividers to all peripherals. The STM32F103 tree: 8 MHz HSE → PLL ×9 → 72 MHz SYSCLK → AHB /1 → 72 MHz → APB2 /1 → 72 MHz (GPIO, USART1, TIM1) → APB1 /2 → 36 MHz (TIM2-4, USART2-3, I²C, SPI).
Design a traffic light system with these requirements:
Your design: 4 states (GREEN, YELLOW, RED, EMERGENCY). TIM2 at 1Hz counts seconds. EXTI for pedestrian button (priority 3) and emergency input (priority 0). State machine in main loop. Can you identify all the registers you'd need to configure?
| This Lesson | Next Steps |
|---|---|
| Bare-metal registers | HAL libraries (STM32Cube), Arduino framework |
| Super-loop / interrupt-driven | RTOS (FreeRTOS — tasks, queues, semaphores) |
| GPIO + Timer + UART | DMA (Direct Memory Access — transfers without CPU) |
| Single MCU | Multi-processor systems, CAN bus (automotive) |
| C programming | Rust for embedded (memory safety without runtime cost) |
"What I cannot create, I do not understand." — Richard Feynman