Effective Modern C++, Chapter 7

The Concurrency API

C++11 brought concurrency into the language for the first time. Tasks vs threads, futures vs promises, std::atomic vs volatile — each serves a distinct purpose. Getting them confused leads to bugs that are nightmarish to debug.

Prerequisites: Chapters 1-6. Basic understanding of threads and shared memory.
11
Chapters
4+
Simulations

Chapter 0: Tasks vs Threads

You want to run a function asynchronously. You have two choices:

c++
int doAsyncWork();

// Thread-based: you manage everything
std::thread t(doAsyncWork);

// Task-based: the runtime manages threads for you
auto fut = std::async(doAsyncWork);
Prefer task-based. std::async gives you a future for the return value, propagates exceptions via get(), and lets the runtime handle thread creation, destruction, load balancing, and oversubscription avoidance.

Three kinds of "thread"

KindWhat It Is
Hardware threadPhysical execution unit on a CPU core. Fixed by hardware.
Software threadOS-managed thread, scheduled onto hardware threads. More can exist than hardware threads.
std::threadC++ object that acts as a handle to a software thread. Can be null, moved, joined, or detached.

Why thread-based is dangerous

Thread exhaustion
std::thread t(f) throws std::system_error if the OS is out of threads. Even if f is noexcept.
Oversubscription
More ready threads than hardware threads = expensive context switches, cache pollution, thrashing.
No return value
std::thread gives you no direct way to get the function's return value or catch its exception.

std::async shifts all of these problems to the Standard Library implementer. It can defer execution, use thread pools, or steal work — you get the result via fut.get().

When to use raw threads: (1) You need native_handle() for platform-specific APIs (priorities, affinity). (2) You need to optimize thread usage for a known hardware profile. (3) You're implementing threading technology beyond the standard API.
What happens if std::thread t(f) runs out of OS threads?

Chapter 1: Launch Policies

std::async doesn't always run your function asynchronously. Its default launch policy is std::launch::async | std::launch::deferred, which means: "run it however you want."

PolicyBehavior
std::launch::asyncMust run on a different thread. Guaranteed concurrency.
std::launch::deferredRuns synchronously when you call get() or wait(). May never run at all.
Default (both or'd)Runtime decides. Might be async, might be deferred. You don't know.
The default policy is dangerous because:
• You can't predict if f runs concurrently.
• You can't predict which thread's thread_local variables are used.
f might never run (if get/wait is never called).
• Timeout loops with wait_for may run forever (deferred returns deferred, never ready).

The infinite loop trap

c++
auto fut = std::async(f);  // default policy

while (fut.wait_for(100ms) != std::future_status::ready) {
    ...  // If f is deferred, this loops FOREVER!
         // wait_for always returns future_status::deferred
}

The fix

c++
auto fut = std::async(f);

if (fut.wait_for(0s) == std::future_status::deferred) {
    fut.get();  // call synchronously
} else {
    while (fut.wait_for(100ms) != std::future_status::ready) {
        ...  // safe: task is not deferred
    }
}
Rule of thumb: If asynchronous execution is essential, always specify std::launch::async explicitly: auto fut = std::async(std::launch::async, f);
What does wait_for return for a deferred task?

Chapter 2: Make std::threads Unjoinable on All Paths

Every std::thread is either joinable (corresponds to a running/runnable thread) or unjoinable (default-constructed, moved-from, joined, or detached). If a joinable thread's destructor runs, your program is terminated.

Unjoinable StateHow It Got There
Default-constructedstd::thread t; — no function to execute
Moved-fromauto t2 = std::move(t); — t is now empty
Joinedt.join(); — underlying thread finished
Detachedt.detach(); — connection severed
Why program termination? The standard rejected both alternatives: implicit join (could silently block for minutes) and implicit detach (could cause memory corruption by writing to destroyed stack frames).

ThreadRAII: an RAII wrapper

c++
class ThreadRAII {
public:
    enum class DtorAction { join, detach };

    ThreadRAII(std::thread&& t, DtorAction a)
        : action(a), t(std::move(t)) {}

    ~ThreadRAII() {
        if (t.joinable()) {
            if (action == DtorAction::join) t.join();
            else t.detach();
        }
    }

    ThreadRAII(ThreadRAII&&) = default;
    ThreadRAII& operator=(ThreadRAII&&) = default;

    std::thread& get() { return t; }
private:
    DtorAction action;
    std::thread t;  // declared last: initialized last (after all other members)
};
Design notes: (1) The joinability check is necessary because join/detach on an unjoinable thread is UB. (2) std::thread is declared last so it's initialized after all other members, preventing a race. (3) Move operations are explicitly defaulted because declaring a destructor suppresses them.

Why both join and detach were rejected as defaults

OptionProblem
Implicit joinCould silently block the destructor for an unpredictable duration. If conditionsAreSatisfied() returns false, you'd still wait for the entire filter computation to finish.
Implicit detachThe thread keeps running after the function returns. It may write to stack memory that's now occupied by another function's local variables. Memory corruption with no visible cause.
Program terminationHarsh, but at least the bug is immediately visible. The standard committee chose this.

Using ThreadRAII

c++
bool doWork(std::function<bool(int)> filter, int maxVal) {
    std::vector<int> goodVals;

    ThreadRAII t(
        std::thread([&filter, maxVal, &goodVals] {
            for (auto i = 0; i <= maxVal; ++i)
                if (filter(i)) goodVals.push_back(i);
        }),
        ThreadRAII::DtorAction::join  // join if exception or early return
    );

    auto nh = t.get().native_handle();
    // ... set priority via platform API ...

    if (conditionsAreSatisfied()) {
        t.get().join();
        performComputation(goodVals);
        return true;
    }
    return false;  // ThreadRAII destructor joins the thread safely
}
What happens if a joinable std::thread's destructor is called?

Chapter 3: Future Destructor Behavior

Unlike std::thread, futures don't terminate your program when destroyed. But their destructor behavior is surprisingly nuanced.

The shared state

Results from async tasks live in a shared state — a heap-allocated object accessible to both the caller's future and the callee's std::promise:

Future / Promise / Shared State

The shared state sits between caller and callee.

Destructor rules

ConditionDestructor Behavior
Normal caseDestroys the future's data members. No join, no detach, no block.
Exception: last future for a non-deferred std::async taskBlocks until the task completes (implicit join)
All three conditions must hold for blocking: (1) The shared state was created by std::async. (2) The task's policy is std::launch::async. (3) This future is the last one referring to the shared state.

std::packaged_task avoids blocking

c++
// This future's dtor will NOT block
std::packaged_task<int()> pt(calcValue);
auto fut = pt.get_future();  // shared state NOT from std::async
std::thread t(std::move(pt));
// You must join or detach t yourself
When does a future's destructor block until the task completes?

Chapter 4: Void Futures for One-Shot Events

You have a detecting task that needs to signal a reacting task. What's the best mechanism?

MechanismProblem
Condition variableRequires a mutex (even if no shared data), timing-dependent, spurious wakeups
Atomic flag + pollingWastes CPU — the reacting task spins instead of blocking
Condvar + flag comboWorks but stilted — two redundant signaling channels
std::promise<void>Clean, one-shot, no mutex, no spurious wakeups, truly blocks
c++
std::promise<void> p;

// Detecting task:
...                       // detect event
p.set_value();              // signal!

// Reacting task:
...                       // prepare
p.get_future().wait();    // block until signaled
...                       // react
Limitations: (1) Heap-allocated shared state (cost). (2) One-shot only: a std::promise can be set only once. For repeated notifications, use condvar+flag.

Suspending threads at creation

c++
std::promise<void> p;

void detect() {
    std::thread t([] {
        p.get_future().wait();   // suspend until signaled
        react();
    });
    ...                         // configure t (priority, affinity, etc.)
    p.set_value();               // unsuspend t
    ...
    t.join();
}

Multiple reacting tasks: use shared_future

c++
std::promise<void> p;
auto sf = p.get_future().share();

std::vector<std::thread> threads;
for (int i = 0; i < n; ++i)
    threads.emplace_back([sf]{ sf.wait(); react(); });

p.set_value();  // wake all threads simultaneously

for (auto& t : threads) t.join();
Why is std::promise<void> better than a condition variable for one-shot event signaling?

Chapter 5: std::atomic vs volatile

std::atomic and volatile serve completely different purposes. Confusing them is a common and dangerous mistake.

Featurestd::atomicvolatile
PurposeConcurrent accessSpecial memory (MMIO)
AtomicityGuaranteed (all operations)None
Reordering preventionYes (sequential consistency)No
Redundant load eliminationAllowed (optimizer can merge)Forbidden (every read/write preserved)
Dead store eliminationAllowedForbidden
Copyable?No (copy ops deleted)Yes

Atomic guarantees atomicity and ordering

c++
std::atomic<int> ai(0);
ai = 10;      // atomically set to 10
++ai;         // atomic read-modify-write → 11
--ai;         // atomic RMW → 10
// Other threads see only 0, 10, or 11. Never a torn value.

// Sequential consistency: no reordering past atomic writes
auto val = computeValue();    // (1)
std::atomic<bool> ready(false);
ready = true;                  // (2) guaranteed after (1)

volatile prevents optimization of special memory

c++
volatile int x;

auto y = x;  // read x (preserved — might be sensor data)
y = x;       // read x again (preserved — value might have changed!)

x = 10;      // write x (preserved — might be a hardware command)
x = 20;      // write x (preserved — might be a different command)

// Without volatile, compiler would optimize to: auto y = x; x = 20;

Code reordering: the critical difference

c++
// With std::atomic: guaranteed ordering
auto imptValue = computeImportantValue();  // (1)
std::atomic<bool> valAvailable(false);
valAvailable = true;                        // (2) — (1) guaranteed before (2)
// Other threads see imptValue BEFORE valAvailable becomes true

// With volatile: NO ordering guarantee
auto imptValue = computeImportantValue();  // might be reordered!
volatile bool valAvailable = false;
valAvailable = true;                        // might execute BEFORE imptValue!
// Other threads could see valAvailable = true but imptValue not yet computed
Sequential consistency: std::atomic (with default memory order) guarantees that no code preceding an atomic write can appear to other threads to happen after it. This is the foundation of lock-free programming. volatile provides no such guarantee.

Why std::atomic isn't copyable

c++
std::atomic<int> x;
auto y = x;       // ERROR! Copy operations are deleted.
// Reading x and writing y atomically as ONE operation
// is impossible on most hardware.

// Use load() and store() instead:
std::atomic<int> y(x.load());  // atomic read, then init y
y.store(x.load());               // atomic read, then atomic write
// Each operation is atomic, but the pair is NOT atomic together
They can be combined: volatile std::atomic<int> vai; means "operations are atomic AND can't be optimized away." Useful for memory-mapped I/O that's accessed from multiple threads.
Which is correct for concurrent shared data: std::atomic or volatile?

Chapter 6: The Thread Model

Understanding the three layers of threads helps you reason about concurrency.

Hardware → Software → std::thread

How C++ threads map to OS threads and hardware cores.

LayerCountManaged By
Hardware threadsFixed (e.g., 8 cores × 2 SMT = 16)CPU
Software threadsHundreds to thousandsOS scheduler
std::thread objectsAs many as you createYour code (or std::async)
Oversubscription occurs when ready-to-run software threads exceed hardware threads. This causes context switches, cache pollution, and performance degradation. std::async with the default policy can avoid this by deferring tasks.
What is oversubscription?

Chapter 7: Event Communication Patterns

Click each pattern to see its trade-offs.

Event Signaling Patterns

Four approaches to inter-thread event communication.

What is the main drawback of polling an atomic flag for event notification?

Chapter 8: Atomic vs Volatile Visualized

See how std::atomic and volatile handle concurrent increments differently.

Concurrent Increment: atomic vs volatile

Two threads each increment a counter once. See the possible outcomes.

Chapter 9: Beyond

ItemKey Takeaway
Item 35Prefer task-based (std::async) to thread-based (std::thread). Tasks handle thread management, return values, and exceptions.
Item 36The default launch policy may defer. Specify std::launch::async if concurrency is essential.
Item 37Make std::threads unjoinable on all paths. Use RAII wrappers. Declare thread members last.
Item 38Only futures from std::async (non-deferred, last reference) block in their destructors.
Item 39Use std::promise<void> for clean one-shot event communication. Use shared_future for multiple reactors.
Item 40std::atomic = concurrency (atomicity + ordering). volatile = special memory (no optimization). Different tools for different jobs.
Next: Chapter 8: Tweaks covers the final two items: pass by value for copyable parameters, and emplacement vs insertion.

"For the first time in C++'s history, programmers can write multithreaded programs with standard behavior across all platforms." — Scott Meyers