Dynamic memory, bitwise ops, recursion, data structures, state machines — the toolkit that separates embedded engineers from script writers.
You write x = [1, 2, 3] in Python and never think about where those bytes live. Python's runtime allocates memory for you, resizes it when you append, and garbage-collects it when you're done. It's wonderful — until you're programming a microcontroller with 8 KB of RAM.
In embedded systems — pacemakers, antilock brakes, drone flight controllers — there is no operating system. No garbage collector. No virtual memory. You have a fixed chunk of Flash (for code) and a fixed chunk of SRAM (for data). Every byte is accounted for. One dangling pointer (pointing at freed memory) and the device crashes. One stack overflow (stack grows into the heap) and the motor controller sends full throttle.
This is why C exists. It gives you total control over memory layout, allocation timing, and data representation. The tradeoff: you're responsible for everything Python hides from you.
Let's look at a real embedded memory map. A typical STM32F0 microcontroller has:
| Region | Size | Contents |
|---|---|---|
| Flash | 32 KB | Your compiled code + constants |
| SRAM | 8 KB | Stack + heap + static variables |
Inside that 8 KB of SRAM, multiple regions compete for space. The stack grows downward from the top. The heap grows upward from the bottom. If they collide, your program crashes with no warning.
Drag the slider to increase call depth. Watch the stack grow down toward the heap. When they collide — crash.
c // Typical embedded memory layout (linker script) // Address 0x20000000: start of SRAM (8192 bytes) // [static data] [heap --> <-- stack] [top] // 0x20000000 0x20002000 #define SRAM_START 0x20000000 #define SRAM_SIZE 8192 #define STACK_TOP (SRAM_START + SRAM_SIZE) // 0x20002000 static int sensor_buffer[64]; // 256 bytes of static static int motor_state; // 4 bytes of static // Remaining ~7932 bytes split between stack and heap
In embedded C, you rarely work with whole integers. You work with individual bits. A hardware register is a 32-bit value where each bit controls something different: bit 5 turns on an LED, bit 3 enables a timer, bit 12 selects the clock source. You need surgical tools to flip one bit without disturbing the others.
C gives you six bitwise operators that work on each bit independently:
| Operator | Name | What it does | Example (8-bit) |
|---|---|---|---|
& | AND | 1 only if BOTH bits are 1 | 0b11001010 & 0b11110000 = 0b11000000 |
| | OR | 1 if EITHER bit is 1 | 0b11001010 | 0b00000101 = 0b11001111 |
^ | XOR | 1 if bits DIFFER | 0b11001010 ^ 0b11110000 = 0b00111010 |
~ | NOT | Flip all bits | ~0b11001010 = 0b00110101 |
<< | Left shift | Shift bits left, fill with 0 | 0b00000001 << 5 = 0b00100000 |
>> | Right shift | Shift bits right | 0b00100000 >> 5 = 0b00000001 |
reg |= (1 << N) — OR with a mask that has only bit N set.reg &= ~(1 << N) — AND with a mask that has every bit set EXCEPT N.reg ^= (1 << N) — XOR flips just that one bit.Here's a real embedded example. On an STM32, GPIO Port A's Output Data Register (GPIOA->ODR) is at address 0x40020014. Bit 5 controls the onboard LED. To turn it on:
c // Turn on LED (bit 5 of GPIOA->ODR) GPIOA->ODR |= (1 << 5); // Set bit 5: xxxx_xxxx |= 0010_0000 // Turn off LED GPIOA->ODR &= ~(1 << 5); // Clear bit 5: xxxx_xxxx &= 1101_1111 // Toggle LED GPIOA->ODR ^= (1 << 5); // Flip bit 5 only // Check if bit 5 is set if (GPIOA->IDR & (1 << 5)) { // Button is pressed }
Bit fields let you pack multiple flags into a single byte. Instead of wasting 4 bytes (an int) for a boolean flag, you use 1 bit:
c struct SensorFlags { uint8_t ready : 1; // 1 bit: sensor data ready uint8_t overflow : 1; // 1 bit: buffer overflow uint8_t channel : 3; // 3 bits: channel 0-7 uint8_t gain : 2; // 2 bits: gain 0-3 uint8_t reserved : 1; // 1 bit: padding to 8 bits total }; // sizeof = 1 byte! Not 4 ints = 16 bytes.
Click any bit to toggle it. Choose an operation and a mask below to see the result.
You already know a pointer holds a memory address. But pointer arithmetic is where C reveals its true power — and its danger. When you write p + 1, C doesn't add 1 byte. It adds sizeof(*p) bytes. This is how arrays work under the hood: arr[i] is exactly *(arr + i).
int *p = 0x1000 and sizeof(int) = 4, then p + 1 = 0x1004, p + 2 = 0x1008. The compiler does the multiplication for you.Let's trace through memory. Suppose we have an array of 4-byte integers starting at address 0x2000:
c int arr[4] = {10, 20, 30, 40}; int *p = arr; // p = 0x2000 // Pointer arithmetic: *p // value at 0x2000 = 10 *(p + 1) // value at 0x2004 = 20 (NOT 0x2001!) *(p + 2) // value at 0x2008 = 30 *(p + 3) // value at 0x200C = 40 // These are identical: arr[2] // = 30 *(arr + 2) // = 30 *(p + 2) // = 30
Double pointers (char **) hold the address of a pointer. Why? Two common reasons: (1) you need a function to modify a pointer (pass pointer-to-pointer), and (2) arrays of strings (argv in main).
c // Function that allocates memory and returns it via double pointer void create_buffer(int **out, int size) { *out = (int *)malloc(size * sizeof(int)); } int *buf = NULL; create_buffer(&buf, 100); // buf now points to allocated memory // Without **, the function would modify a LOCAL copy of buf
Function pointers store the address of a function. This enables callbacks — you pass behavior as an argument:
c // Declare a function pointer type typedef void (*callback_t)(int event_code); // A function matching that signature void on_button_press(int code) { printf("Button %d pressed\n", code); } // Register the callback callback_t handler = on_button_press; handler(5); // calls on_button_press(5)
An array of structs in memory. Each struct is 12 bytes. Watch how p+1 jumps 12 bytes, not 1.
sizeof(SensorReading) each step — no array indexing, no bounds checking, just raw pointer arithmetic. Fast. Dangerous.int *p = 0x3000 and sizeof(int) = 4, what is the value of p + 3?Static arrays have a problem: you must decide their size at compile time. But what if you don't know how many sensor readings you'll collect? You need memory allocated at runtime. In C, that means asking the heap for space using malloc.
The heap is a region of memory managed by a runtime allocator. Unlike the stack (which is automatic), heap memory persists until you explicitly free it. Forget to free? That's a memory leak. Free it twice? That's undefined behavior (usually a crash).
malloc(size) — allocate size bytes, uninitialized.calloc(n, size) — allocate n * size bytes, zeroed.realloc(ptr, new_size) — resize existing allocation (may move it!).free(ptr) — return memory to the heap.c // Allocate array of 100 ints (400 bytes on 32-bit system) int *data = (int *)malloc(100 * sizeof(int)); if (data == NULL) { // malloc failed! Out of memory. Handle gracefully. return -1; } // Use it data[0] = 42; data[99] = 7; // Need more space? Resize to 200 ints int *bigger = (int *)realloc(data, 200 * sizeof(int)); if (bigger == NULL) { // realloc failed, but data is still valid! free(data); return -1; } data = bigger; // ptr may have moved // Done. Free it. free(data); data = NULL; // ALWAYS null after free to prevent dangling pointer
Fragmentation is the silent killer of embedded systems. After many malloc/free cycles, the heap becomes a patchwork of used and unused blocks. You might have 1 KB free total, but split into 10 non-contiguous 100-byte fragments. A request for 200 bytes fails even though "enough" memory exists.
Click malloc to allocate blocks. Click free to release a random block. Watch fragmentation grow.
Heap: 0 / 1024 bytes used. 0 fragments.
malloc entirely. They pre-allocate all memory as static arrays and use pool allocators (fixed-size blocks). No fragmentation. Deterministic timing. More code, but zero surprises at runtime.free(ptr), what should you immediately do with ptr?A recursive function calls itself. That sounds circular, but it works because each call handles a smaller version of the problem, and there's a base case that stops the chain. Without a base case, you recurse forever and overflow the stack.
Every recursive call creates a new stack frame — a block of memory on the stack holding that call's local variables, parameters, and return address. When the function returns, its frame is popped. This is why deep recursion is dangerous in embedded: each frame costs memory.
Let's trace factorial(4) step by step:
c int factorial(int n) { if (n <= 1) return 1; // Base case return n * factorial(n - 1); // Recursive case } // Trace of factorial(4): // factorial(4): return 4 * factorial(3) [frame 4 pushed] // factorial(3): return 3 * factorial(2) [frame 3 pushed] // factorial(2): return 2 * factorial(1) [frame 2 pushed] // factorial(1): return 1 (BASE CASE) [frame 1 pushed] // factorial(2): return 2 * 1 = 2 [frame 1 popped] // factorial(3): return 3 * 2 = 6 [frame 2 popped] // factorial(4): return 4 * 6 = 24 [frame 3 popped]
Binary search is another natural fit for recursion. Each call halves the search space:
c int binary_search(int *arr, int lo, int hi, int target) { if (lo > hi) return -1; // Base: not found int mid = lo + (hi - lo) / 2; // Avoid overflow vs (lo+hi)/2 if (arr[mid] == target) return mid; if (arr[mid] < target) return binary_search(arr, mid + 1, hi, target); return binary_search(arr, lo, mid - 1, target); }
factorial above is NOT tail recursive (it does n * result after). A tail-recursive version passes an accumulator: fact_tail(n, acc) = fact_tail(n-1, n*acc).Step through factorial(N). Watch stack frames push and pop. Each frame holds its own n value.
Arrays are fast for random access (arr[i] is O(1)) but terrible for insertion in the middle (you must shift everything). Different problems demand different structures. The big three in C: linked lists, binary search trees, and hash tables.
A linked list is a chain of nodes, each containing data and a pointer to the next. Insertion anywhere is O(1) (just rewire pointers), but finding the i-th element is O(n) (walk the chain).
c typedef struct Node { int data; struct Node *next; } Node; // Insert at head: O(1) void push(Node **head, int val) { Node *n = (Node *)malloc(sizeof(Node)); n->data = val; n->next = *head; // new node points to old head *head = n; // head now points to new node } // Free entire list void free_list(Node *head) { while (head) { Node *tmp = head; head = head->next; free(tmp); } }
A BST stores values such that for every node: all left descendants are smaller, all right descendants are larger. Search, insert, and delete are all O(log n) on average — but O(n) worst case if the tree becomes a straight line (unbalanced).
c typedef struct TreeNode { int key; struct TreeNode *left, *right; } TreeNode; TreeNode* insert(TreeNode *root, int key) { if (!root) { TreeNode *n = (TreeNode*)malloc(sizeof(TreeNode)); n->key = key; n->left = n->right = NULL; return n; } if (key < root->key) root->left = insert(root->left, key); else root->right = insert(root->right, key); return root; }
A hash table maps keys to values in O(1) average time. You hash the key to get an array index. Collisions (two keys mapping to the same index) are handled by chaining (linked list at each slot) or open addressing (probe for next empty slot).
| Structure | Search | Insert | Delete | Best for |
|---|---|---|---|---|
| Array | O(n) / O(1) index | O(n) middle | O(n) | Random access by index |
| Linked list | O(n) | O(1) at head | O(1) if have ptr | Frequent insert/delete |
| BST | O(log n) avg | O(log n) avg | O(log n) avg | Sorted data, range queries |
| Hash table | O(1) avg | O(1) avg | O(1) avg | Key-value lookup |
Type a number and insert it. Watch the tree grow. Notice how sorted input creates a degenerate (linear) tree.
Sorting is the most studied problem in computer science. Why? Because searching a sorted array is O(log n) instead of O(n). That's the difference between finding a record in 20 comparisons vs 1,000,000. The key divide: O(n²) algorithms (bubble, insertion) vs O(n log n) algorithms (quicksort, mergesort).
Quicksort picks a pivot, partitions the array so everything smaller is left and everything larger is right, then recurses on each half. Average case: O(n log n). Worst case (already sorted): O(n²). In practice, fastest for random data.
c void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; } int partition(int *arr, int lo, int hi) { int pivot = arr[hi]; // last element as pivot int i = lo - 1; for (int j = lo; j < hi; j++) { if (arr[j] <= pivot) { i++; swap(&arr[i], &arr[j]); } } swap(&arr[i + 1], &arr[hi]); return i + 1; } void quicksort(int *arr, int lo, int hi) { if (lo < hi) { int p = partition(arr, lo, hi); quicksort(arr, lo, p - 1); quicksort(arr, p + 1, hi); } }
Let's trace quicksort on [38, 27, 43, 3, 9, 82]:
Final sorted: [3, 9, 27, 38, 43, 82].
| Algorithm | Best | Average | Worst | Stable? | Space |
|---|---|---|---|---|---|
| Bubble sort | O(n) | O(n²) | O(n²) | Yes | O(1) |
| Insertion sort | O(n) | O(n²) | O(n²) | Yes | O(1) |
| Quicksort | O(n log n) | O(n log n) | O(n²) | No | O(log n) |
| Mergesort | O(n log n) | O(n log n) | O(n log n) | Yes | O(n) |
Watch bars get sorted in real time. Compare algorithm speeds. Red = comparing, green = sorted.
A Finite State Machine (FSM) is the most important design pattern in embedded C. Your microcontroller is always in exactly one state. When an event occurs, it transitions to a new state and may produce an action. That's it. No complex logic trees. No tangled if-else chains. Just: current state + event → action + next state.
Why FSMs dominate embedded? Because they're provably correct. You can draw every possible state and transition on paper. If you can't reach a dangerous state from the diagram, it can't happen in code. Compare that to 500 lines of nested if-statements where one forgotten condition means the motor never stops.
enum.enum.switch(state) with nested switch(event) inside each case.c typedef enum { IDLE, COIN_IN, SELECTING, DISPENSING, CHANGE } State; typedef enum { EVT_COIN, EVT_SELECT, EVT_DONE, EVT_TIMEOUT } Event; State current = IDLE; void fsm_step(Event evt) { switch (current) { case IDLE: if (evt == EVT_COIN) { display("Coin accepted"); current = COIN_IN; } break; case COIN_IN: if (evt == EVT_SELECT) { start_motor(); current = DISPENSING; } else if (evt == EVT_TIMEOUT) { return_coin(); current = IDLE; } break; case DISPENSING: if (evt == EVT_DONE) { give_change(); current = CHANGE; } break; case CHANGE: current = IDLE; // auto-transition back break; } }
The transition table is the formal specification:
| Current State | Event | Action | Next State |
|---|---|---|---|
| IDLE | COIN | Display "accepted" | COIN_IN |
| COIN_IN | SELECT | Start motor | DISPENSING |
| COIN_IN | TIMEOUT | Return coin | IDLE |
| DISPENSING | DONE | Give change | CHANGE |
| CHANGE | (auto) | — | IDLE |
Click events to drive the state machine. Watch the current state highlight, transitions animate, and actions fire.
State: IDLE. Waiting for coin...
When you define a struct in C, the compiler doesn't always pack the fields tightly. It inserts padding bytes to ensure each field starts at an aligned address. Why? Because most CPUs can only read a 4-byte int from an address that's a multiple of 4. Unaligned access is either slow (ARM) or illegal (older ARM, some MIPS).
address_of(field) = base_address + offsetLet's trace a real struct:
c struct Example { char a; // 1 byte, offset 0 // 3 bytes PADDING (next field needs 4-byte alignment) int b; // 4 bytes, offset 4 short c; // 2 bytes, offset 8 // 2 bytes PADDING (struct size must be multiple of largest align) }; // sizeof(struct Example) = 12, NOT 7! // Memory layout (byte by byte): // [a][P][P][P][b][b][b][b][c][c][P][P] // 0 1 2 3 4 5 6 7 8 9 10 11
You can reduce wasted space by reordering fields from largest to smallest:
c struct Packed { int b; // 4 bytes, offset 0 short c; // 2 bytes, offset 4 char a; // 1 byte, offset 6 // 1 byte PADDING (struct align to 4) }; // sizeof(struct Packed) = 8 — saved 4 bytes!
Endianness matters for data sent between machines. Little-endian (x86, ARM default): least significant byte at lowest address. Big-endian (network byte order): most significant byte first. The value 0x12345678:
| Address | Little-endian | Big-endian |
|---|---|---|
| 0x00 | 0x78 (LSB) | 0x12 (MSB) |
| 0x01 | 0x56 | 0x34 |
| 0x02 | 0x34 | 0x56 |
| 0x03 | 0x12 (MSB) | 0x78 (LSB) |
See how char, int, and short lay out in memory with alignment padding. Toggle packed mode to remove padding.
sizeof = 12 bytes (5 wasted on padding)
uint32_t) in a known byte order.char a; int b; char c;. With 4-byte alignment, sizeof is:In C, files are accessed through file pointers (FILE *). You open a file, read or write, then close it. The key insight: C treats files as a stream of bytes. You maintain a file position indicator that advances as you read/write. You can jump to any position with fseek.
c // Write 100 records to a binary file typedef struct { uint32_t id; float value; } Record; FILE *f = fopen("data.bin", "wb"); // "wb" = write binary if (!f) { perror("fopen"); return 1; } Record rec = { .id = 42, .value = 3.14f }; fwrite(&rec, sizeof(Record), 1, f); // write 1 record fclose(f); // Read record N directly (random access) f = fopen("data.bin", "rb"); int record_num = 5; fseek(f, record_num * sizeof(Record), SEEK_SET); fread(&rec, sizeof(Record), 1, f); // rec now contains record #5 fclose(f);
offset = N × sizeof(Record)fseek(file, offset, SEEK_SET)C's primitive types have implementation-defined sizes. int could be 2 bytes on a 16-bit micro or 4 bytes on a PC. If you write raw int values to a file on one machine and read them on another, you'll get garbage. The solution: fixed-width types from <stdint.h>:
| Type | Guaranteed size | Use when |
|---|---|---|
uint8_t | 1 byte, unsigned | Raw bytes, characters |
int16_t | 2 bytes, signed | Audio samples, sensors |
uint32_t | 4 bytes, unsigned | Timestamps, IDs, registers |
int64_t | 8 bytes, signed | Large counters, file sizes |
c // Portable serialization: write uint32 in little-endian void write_le32(FILE *f, uint32_t val) { uint8_t buf[4]; buf[0] = val & 0xFF; buf[1] = (val >> 8) & 0xFF; buf[2] = (val >> 16) & 0xFF; buf[3] = (val >> 24) & 0xFF; fwrite(buf, 4, 1, f); } // Portable deserialization: read uint32 from little-endian uint32_t read_le32(FILE *f) { uint8_t buf[4]; fread(buf, 4, 1, f); return buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24); }
fwrite(&my_struct, sizeof(my_struct), 1, f) for portable formats. The padding, endianness, and sizes may all differ on the reader's side.uint32_t instead of unsigned int in portable binary file formats?How does printf accept any number of arguments? Through C's variable argument (varargs) mechanism. The function declares a minimum set of fixed parameters, then uses va_list to iterate over the rest. This is how you write your own printf-like logging function in embedded code.
c #include <stdarg.h> // Custom debug logger: logs to UART in embedded, stdout in testing void debug_log(const char *fmt, ...) { va_list args; va_start(args, fmt); // Initialize: 'fmt' is the last fixed param // Extract arguments one by one based on format string // (simplified: real impl would parse fmt) vprintf(fmt, args); // printf that takes a va_list va_end(args); // MUST call when done. Cleans up. } // Usage: debug_log("Sensor %d: value = %f\n", 3, 27.5); debug_log("Error at addr 0x%08X\n", 0xDEADBEEF);
va_start(args, last_fixed) — initialize, naming the last fixed parameter.va_arg(args, type) — extract next argument as type. YOU must know the type.va_end(args) — cleanup. Always call this. Always.float where int is expected? Undefined behavior. This is why printf format strings exist — they encode the expected types.C has no try/catch. But embedded code often has deeply nested function calls that can fail. Unwinding the stack manually (checking return codes at every level) is tedious. setjmp/longjmp provides non-local jumps — you mark a return point, and can jump back to it from any depth.
c #include <setjmp.h> jmp_buf error_recovery; void deep_function() { // ... 5 levels deep in call stack ... if (hardware_fault()) { longjmp(error_recovery, 1); // Jump back to setjmp site! // Code below here NEVER executes } } void main_loop() { if (setjmp(error_recovery) != 0) { // We got here via longjmp. Handle the error. reset_hardware(); printf("Recovered from fault\n"); return; } // Normal execution path: init_sensors(); calibrate(); // calls deep_function somewhere inside run_control_loop(); }
Warning: longjmp unwinds the stack without calling any cleanup code in intermediate functions. Any malloc'd memory in those functions is leaked. Any open file handles are leaked. Any hardware left in a half-configured state stays that way. This is why embedded teams use it ONLY at the top-level error boundary.
va_arg(args, int) do?You now have the complete low-level C toolkit: bit manipulation for hardware registers, pointer arithmetic for efficient traversal, dynamic memory (and why embedded avoids it), recursion, data structures, sorting, state machines for control flow, and storage maps for understanding what the compiler actually does with your structs.
malloc must have a matching free. No exceptions.malloc's return value. It CAN return NULL.free the same pointer twice.| Operation | Array | Linked List | BST (balanced) | Hash Table |
|---|---|---|---|---|
| Access by index | O(1) | O(n) | O(log n) | N/A |
| Search by value | O(n) | O(n) | O(log n) | O(1) |
| Insert at end | O(1) amortized | O(1) w/ tail ptr | O(log n) | O(1) |
| Insert at middle | O(n) | O(1) if have ptr | O(log n) | N/A |
| Delete | O(n) | O(1) if have ptr | O(log n) | O(1) |
c // 1. Define states and events as enums typedef enum { ST_INIT, ST_RUNNING, ST_ERROR, ST_COUNT } State; typedef enum { EV_START, EV_FAULT, EV_RESET, EV_COUNT } Event; // 2. Define action type typedef void (*Action)(void); // 3. Transition table (table-driven FSM) typedef struct { State next; Action action; } Transition; Transition table[ST_COUNT][EV_COUNT] = { /* EV_START EV_FAULT EV_RESET */ /* INIT */ {{ST_RUNNING, go}, {ST_ERROR, log}, {ST_INIT, nop}}, /* RUN */ {{ST_RUNNING,nop}, {ST_ERROR, stop}, {ST_INIT, reset}}, /* ERROR */ {{ST_ERROR, nop}, {ST_ERROR, nop}, {ST_INIT, reset}}, }; // 4. Step function: O(1), no switch-case needed void fsm_dispatch(Event e) { Transition t = table[state][e]; if (t.action) t.action(); state = t.next; }
| Task | Expression | Example |
|---|---|---|
| Set bit N | reg |= (1 << N) | Turn on LED |
| Clear bit N | reg &= ~(1 << N) | Turn off LED |
| Toggle bit N | reg ^= (1 << N) | Blink LED |
| Check bit N | if (reg & (1 << N)) | Read button |
| Set bits 3-5 | reg |= (0x7 << 3) | Set a 3-bit field |
| Clear bits 3-5 | reg &= ~(0x7 << 3) | Clear a 3-bit field |
| Extract bits 3-5 | (reg >> 3) & 0x7 | Read a 3-bit field |
Implement a memory pool allocator for fixed-size blocks. This eliminates fragmentation entirely: all blocks are the same size, so any free block fits any allocation request.
c // Pool allocator: zero fragmentation, O(1) alloc and free #define POOL_SIZE 32 #define BLOCK_SIZE 64 // bytes per block static uint8_t pool_memory[POOL_SIZE * BLOCK_SIZE]; static uint8_t pool_used[POOL_SIZE]; // 0 = free, 1 = used void* pool_alloc(void) { for (int i = 0; i < POOL_SIZE; i++) { if (!pool_used[i]) { pool_used[i] = 1; return &pool_memory[i * BLOCK_SIZE]; } } return NULL; // pool exhausted } void pool_free(void *ptr) { int idx = ((uint8_t*)ptr - pool_memory) / BLOCK_SIZE; pool_used[idx] = 0; }
"C is quirky, flawed, and an enormous success." — Dennis Ritchie