Distributed Systems

Service Communication

DNS discovery, REST APIs, HTTP, idempotency — how services find and talk to each other.

Prerequisites: Basic networking (IP addresses, ports). That's it.
10
Chapters
8+
Simulations
9
Quizzes

Chapter 0: The Problem

You have an e-commerce site. It started as one monolith — one big application that handles users, products, orders, payments, and shipping. Everything talks to everything through function calls inside the same process. Life is simple.

Then your site grows. The monolith becomes a nightmare: a change in the payment code breaks the product search, deploys take 45 minutes, and the team of 60 engineers is constantly stepping on each other's toes. So you do what every fast-growing company does: you break the monolith into microservices.

Now you have seven separate services: Users, Products, Orders, Payments, Shipping, Inventory, and Notifications. Each one is deployed independently, scales independently, and is owned by a different team. Beautiful.

Except now you have a new problem — the hardest problem in distributed systems.

The fundamental question. How does the Orders service find the Payments service? How does it talk to it? What happens when the Payments service is temporarily down? What happens when the request succeeds but the response gets lost — does the customer get charged twice? These are the problems of service communication, and getting them wrong costs real money.

Before microservices, a function call was instant, reliable, and type-checked by the compiler. Now that same "call" crosses a network: it can be slow (10ms to 500ms), it can fail (packet loss, timeout, crash), it can partially succeed (the payment was charged but the response never arrived), and it has no compiler checking the contract between caller and callee.

This lesson covers the entire stack of service communication: how services find each other (DNS, service discovery), how they talk (HTTP, REST, gRPC), how they agree on contracts (API design, versioning), and how they handle the brutal reality that networks are unreliable (idempotency, retries, circuit breaking).

Watch the Problem

The simulation below shows a monolith being split into microservices. Watch what happens when services need to communicate — requests cross the network, and the network is not your friend.

Monolith to Microservices: The Communication Explosion

Click "Place Order" to trace a request through the system. Notice how many network hops are needed.

7 microservices ready. Click "Place Order" to trace request flow.

A single "Place Order" operation that was one function call in the monolith now requires six network round-trips: Orders calls Inventory (is the item in stock?), then Payments (charge the card), then Shipping (schedule delivery), then Notifications (send confirmation), and each of those might call Users (get the address). Every arrow is a potential point of failure.

Three Problems We Must Solve

ProblemWhat Goes WrongSolution
DiscoveryOrders doesn't know where Payments lives. IP addresses change when services restart.DNS, service registries
CommunicationHow do you encode requests and responses? What format? What protocol?HTTP, REST, gRPC, GraphQL
ReliabilityThe network loses packets. Services crash. Retries cause duplicate charges.Idempotency, circuit breaking, timeouts

We will solve each one, from the ground up.

Concept check: In a microservices architecture, the Orders service sends a payment request to the Payments service. The Payments service charges the credit card successfully but the response is lost due to a network error. The Orders service times out and retries. What is the most dangerous consequence?

Chapter 1: DNS & Service Discovery

Before two services can talk, they need to answer one question: where is the other service? In your apartment, you find your roommate by shouting their name. On the internet, you find a service by resolving a name to an IP address. That system is called DNS — the Domain Name System.

Why Not Just Hardcode IP Addresses?

The naive approach: write the Payments service's IP address directly in the Orders service's config file. PAYMENTS_HOST=10.0.3.42. This works for about a week.

Then the Payments service gets redeployed to a different machine and gets a new IP. Or it scales to three instances and now there are three IPs. Or the data center migrates and every IP changes. Hardcoded IPs are the distributed systems equivalent of hardcoded pixel positions in a Canvas — they work until the screen size changes.

The solution is indirection: give services names, and have a system that maps names to current IP addresses. That system is DNS.

How DNS Works: A Hierarchical Phonebook

DNS is a distributed, hierarchical database. When your browser wants to visit api.stripe.com, it doesn't ask one central server. It walks down a tree:

Your Machine
Checks local cache. "Do I already know api.stripe.com?" If yes, done. If no, ask the recursive resolver.
Recursive Resolver
Your ISP's DNS server (or 8.8.8.8). Checks its cache. If miss, starts the hierarchical walk.
Root Server (.)
"I don't know api.stripe.com, but here's who handles .com domains." Returns the .com TLD servers.
TLD Server (.com)
"I don't know api.stripe.com, but here's who handles stripe.com." Returns Stripe's authoritative nameservers.
Authoritative Server (stripe.com)
"api.stripe.com? That's 54.187.174.169. Here you go. Cache this for 300 seconds."

The answer flows back up the chain. Every intermediate server caches the result. The TTL (Time To Live) — specified by the authoritative server — controls how long caches hold the answer. A TTL of 300 seconds means "this answer is valid for 5 minutes; after that, re-ask."

The TTL tradeoff. Short TTL (30s): changes propagate fast, but DNS servers get hammered with queries. Long TTL (3600s = 1 hour): less DNS load, but if you change IPs, old clients keep hitting the stale address for up to an hour. Production services typically use 60-300s. During a migration, you lower TTL to 30s first, wait for old caches to expire, then change the IP, then raise TTL back.

DNS Record Types That Matter

TypeMapsExampleUse Case
AName → IPv4api.stripe.com → 54.187.174.169Most common. "Where is this service?"
AAAAName → IPv6api.stripe.com → 2600:1f18:...IPv6 equivalent of A record
CNAMEName → Namewww.stripe.com → stripe.comAlias. One extra lookup step.
SRVName → Host + Port + Priority_payments._tcp → pay-1:8080 (pri=10)Service discovery with port and priority

Service Discovery Beyond DNS

Plain DNS works for external services, but inside a microservices cluster, you need more. You need to know not just where a service is, but which instances are healthy. This is service discovery.

Client-side discovery. Every service queries a service registry (like Consul, etcd, or ZooKeeper) and gets back a list of healthy instances. The client picks one — usually round-robin or random. Netflix's Eureka popularized this pattern.

Pros: no extra network hop. Clients can make smart choices (prefer same-zone).

Cons: every client needs the discovery logic. N languages = N implementations.

Server-side discovery. The client sends requests to a load balancer (like AWS ALB, Kubernetes Service, or Envoy). The load balancer queries the registry and forwards the request. The client only needs to know one address: the load balancer's.

Pros: clients are dumb and simple. One discovery implementation.

Cons: extra network hop through the load balancer. LB is a single point of failure (mitigated with redundancy).

In Kubernetes, the dominant pattern today, both happen simultaneously. Kubernetes provides built-in DNS (CoreDNS) that maps service names to ClusterIP virtual addresses. payments.default.svc.cluster.local resolves to a virtual IP. The kernel's iptables or eBPF rules load-balance across healthy pods. From the client's perspective, it's just a DNS name.

Think of it this way. DNS is a phonebook: "What's the phone number for Payments?" Service discovery is a smarter phonebook that also checks: "Is Payments answering the phone? Which of its three lines is least busy? Is line #2 in the same building as you?"

Health Checks: How the Registry Knows Who's Alive

A service registry is only useful if it knows which instances are actually healthy. A server that's running but stuck in an infinite loop, or connected but returning 500 errors, is worse than a server that's down — at least a down server fails fast.

There are two kinds of health checks:

Liveness probes: "Is the process alive?" A simple TCP connection check or an HTTP GET to /healthz that returns 200. If this fails, the instance is removed from the registry (or restarted in Kubernetes). Think of it as checking if someone has a pulse.

Readiness probes: "Can the process handle requests?" A deeper check: is the database connection alive? Is the cache warm? Are all required downstream services reachable? If this fails, traffic stops routing to this instance, but it isn't killed — it might be starting up or recovering. Think of it as checking if someone is awake and ready to work.

python
# Kubernetes-style health check endpoints
@app.route('/healthz')  # Liveness: am I alive?
def liveness():
    return 'ok', 200

@app.route('/readyz')   # Readiness: can I serve traffic?
def readiness():
    if not db.is_connected():
        return 'db not ready', 503
    if not cache.is_warm():
        return 'cache cold', 503
    return 'ok', 200

In Kubernetes, the kubelet runs these probes every N seconds (configurable). Failed liveness probes trigger a pod restart. Failed readiness probes remove the pod from the Service's endpoint list — it stops receiving traffic but keeps running, giving it time to recover.

Kubernetes DNS: The Real-World Pattern

In a Kubernetes cluster, every Service gets a DNS name automatically via CoreDNS:

text
# Full DNS name format:
<service-name>.<namespace>.svc.cluster.local

# Examples:
payments.default.svc.cluster.local     # Payments service in default namespace
orders.production.svc.cluster.local    # Orders service in production namespace

# Within the same namespace, just use the service name:
http://payments:8080/v1/charge          # Kubernetes resolves "payments" automatically

This is why you see PAYMENTS_HOST=payments in Kubernetes config files — not an IP address, just a name. CoreDNS handles the rest, and the Service object handles load balancing across healthy pods. No service registry library needed. No Consul. No Eureka. Just DNS.

Watch DNS Resolution

The simulation below shows a DNS query walking down the hierarchy. Watch the caching at each level and notice how a second query for the same domain is instant.

DNS Resolution: Walking the Hierarchy

Click "Resolve" to query a domain. Click again to see the cache hit. Try different domains.

Click a resolve button to start DNS resolution.
Concept check: You're migrating the Payments service from IP 10.0.3.42 to 10.0.5.99. The DNS TTL for payments.internal is currently 3600 seconds (1 hour). What should you do BEFORE changing the DNS record?

Chapter 2: HTTP Deep Dive

Now that services can find each other, they need a language to talk. The dominant language of the web — and of most microservices — is HTTP (HyperText Transfer Protocol). It's a simple request-response protocol: the client sends a request, the server sends back a response. That's it. No magic.

Anatomy of an HTTP Request

Every HTTP request has exactly four parts:

http
POST /v1/payments HTTP/1.1           # 1. Method + Path + Version
Host: payments.internal:8080          # 2. Headers (key-value metadata)
Content-Type: application/json
Authorization: Bearer eyJhbGciOiJS...
Idempotency-Key: ord_abc123_pay_1
                                         # 3. Blank line (separates headers from body)
{                                        # 4. Body (optional, the actual data)
  "amount": 4999,
  "currency": "usd",
  "customer": "cus_abc123"
}

Method tells the server what you want to do. Path identifies the resource. Headers carry metadata — who you are, what format the body is in, caching directives. Body carries the payload, if any.

HTTP Methods: The Verbs

HTTP defines a small set of methods (also called verbs). Each has a specific semantic meaning:

MethodPurposeHas Body?Idempotent?Safe?
GETRetrieve a resourceNoYesYes
POSTCreate a new resource / trigger actionYesNoNo
PUTReplace a resource entirelyYesYesNo
PATCHPartially update a resourceYesNo*No
DELETERemove a resourceOptionalYesNo
HEADGET but headers only (no body)NoYesYes
OPTIONSWhat methods does this endpoint support?NoYesYes

Safe means it doesn't modify anything — calling it 100 times has the same effect as calling it 0 times. Idempotent means calling it N times has the same effect as calling it once. We'll dig deep into idempotency in Chapter 4.

*PATCH can be idempotent (e.g., "set name to Alice") but isn't guaranteed to be (e.g., "increment counter by 1").

HTTP Status Codes: The Response Signal

The server responds with a status code — a three-digit number that tells the client what happened. The first digit is the category:

RangeCategoryKey Codes
2xxSuccess200 OK, 201 Created, 204 No Content
3xxRedirect301 Moved Permanently, 304 Not Modified
4xxClient Error400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 429 Too Many Requests
5xxServer Error500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout
The retry rule. Should you retry a failed request? 5xx: usually yes — the server had a temporary problem. 429: yes, but respect the Retry-After header. 4xx (except 429): no — you sent a bad request; sending it again won't fix it. Network timeout: maybe — but only if the request is idempotent (Chapter 4). Retrying a non-idempotent POST can cause double-charges.

HTTP/1.1 vs HTTP/2: The Evolution

HTTP/1.1 (1997) has one painful limitation: head-of-line blocking. Each TCP connection handles one request at a time. If you need 6 resources, you either wait for each one sequentially, or open 6 TCP connections (expensive: each needs a new TLS handshake).

HTTP/2 (2015) fixes this with multiplexing: a single TCP connection carries many requests simultaneously using streams. Each request/response pair gets a stream ID. Frames from different streams are interleaved on the wire and reassembled at the other end. One connection, many concurrent requests, no head-of-line blocking at the HTTP level.

HTTP/2 also adds header compression (HPACK) — headers are often repetitive across requests, so compressing them saves significant bandwidth in chatty microservice architectures.

Watch an HTTP Request Lifecycle

HTTP Request Lifecycle: From Client to Server and Back

Watch a request traverse DNS, TCP, TLS, and HTTP layers. Toggle HTTP/2 to see multiplexing.

HTTP/1.1 mode. Click "Send Request" to trace the lifecycle.
Keep-alive. In HTTP/1.1, the Connection: keep-alive header tells the server "don't close the TCP connection after this response — I'll send more requests." This avoids the overhead of re-establishing TCP + TLS for every request. It's the default in HTTP/1.1 and mandatory in HTTP/2.

Headers You Must Know

HTTP headers are key-value pairs that carry metadata about the request or response. Most are boring. These are the ones that matter for service communication:

HeaderDirectionPurposeExample
Content-TypeBothFormat of the bodyapplication/json, application/protobuf
AuthorizationRequestWho is calling?Bearer eyJhbGciOiJS...
AcceptRequestWhat format do I want back?application/json
Idempotency-KeyRequestDeduplicate this requestpay_ord42_attempt1
Retry-AfterResponseWhen to retry (with 429/503)30 (seconds) or a date
X-Request-IDBothTrace a request across servicesreq_abc123def456
Cache-ControlResponseHow long can this be cached?max-age=300, no-cache
ETagResponseVersion fingerprint for caching"33a64df551425fcc55e"
X-Request-ID is your best friend in production. When a request touches 6 services, and one of them returns a 500 error, how do you find the failing service's logs? If every service propagates the same X-Request-ID through all downstream calls, you can search your logging system for that one ID and see the entire request trace. This is the poor man's distributed tracing. Production systems like Uber's Jaeger and Google's Dapper formalize this into structured trace spans.

Exponential Backoff with Jitter

When you retry a failed request, you should not retry immediately. If the server is overloaded, 10,000 clients retrying simultaneously will make it worse. The standard retry strategy is exponential backoff with jitter:

python
import time
import random

def retry_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise  # Last attempt — give up

            # Base delay: 1s, 2s, 4s, 8s, 16s
            base_delay = 2 ** attempt

            # Add jitter: random between 0 and base_delay
            jitter = random.uniform(0, base_delay)
            delay = base_delay + jitter

            # Cap at 30 seconds
            delay = min(delay, 30)

            print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
            time.sleep(delay)

Why jitter? Without it, all clients that failed at time T=0 retry at T=1, fail again, retry at T=2, and so on — they move in lockstep, creating periodic traffic spikes (the "thundering herd" problem). Random jitter spreads the retries across time, smoothing the load.

Concept check: Your service receives an HTTP 503 (Service Unavailable) from a downstream API. Should you retry the request?

Chapter 3: REST API Design

HTTP gives you the transport. But how should you design the API on top of it? The dominant style for web APIs is REST (Representational State Transfer) — a set of constraints that, when followed, produce APIs that are predictable, cacheable, and evolvable.

The Core Idea: Resources, Not Actions

REST says: think about resources (nouns), not actions (verbs). A resource is anything that can be named: a user, an order, a payment, a list of products. Each resource has a URI (Uniform Resource Identifier) — its address. HTTP methods provide the verbs.

rest
# BAD: action-oriented (RPC-style)
POST /createUser
POST /getUser
POST /deleteUser
POST /listUsers

# GOOD: resource-oriented (REST-style)
POST   /users          # create a user
GET    /users/42       # get user 42
DELETE /users/42       # delete user 42
GET    /users          # list all users

The REST version is uniform: once you know the pattern for one resource, you know it for all of them. A developer who has never seen your API can guess that GET /products/17 returns product 17 and DELETE /orders/99 cancels order 99.

CRUD Mapping

REST maps the four basic database operations (Create, Read, Update, Delete) to HTTP methods:

OperationHTTP MethodURI PatternExampleResponse
CreatePOST/resourcesPOST /orders201 Created + Location header
Read oneGET/resources/:idGET /orders/42200 OK + JSON body
Read manyGET/resourcesGET /orders?status=pending200 OK + JSON array
ReplacePUT/resources/:idPUT /orders/42200 OK
Partial updatePATCH/resources/:idPATCH /orders/42200 OK
DeleteDELETE/resources/:idDELETE /orders/42204 No Content

Nested Resources

When one resource belongs to another, nest the URI:

rest
GET  /users/42/orders           # all orders for user 42
GET  /users/42/orders/7         # order 7 for user 42
POST /users/42/orders           # create an order for user 42

But don't nest too deep. /users/42/orders/7/items/3/reviews is hard to read and hard to cache. Two levels is the sweet spot.

Pagination

Any endpoint that returns a list needs pagination. Without it, GET /products returns your entire catalog in one response — 500MB of JSON. There are two main approaches:

Offset-based: ?offset=20&limit=10. Simple. But if items are inserted while you page, you'll skip or duplicate items. Also, ?offset=1000000 forces the database to scan and skip a million rows.

Cursor-based: ?cursor=eyJpZCI6NDJ9&limit=10. The cursor is an opaque token (often a base64-encoded ID). The server uses it to resume from exactly where it left off. No skipping, no duplication, efficient at any depth. Stripe, Slack, and Twitter all use cursor pagination.

A Complete REST API Example

python
from flask import Flask, request, jsonify
from uuid import uuid4

app = Flask(__name__)
orders = {}  # in-memory store (use a real DB in production)

# CREATE — POST /orders
@app.route('/orders', methods=['POST'])
def create_order():
    data = request.json
    order_id = str(uuid4())
    order = {
        'id': order_id,
        'product': data['product'],
        'quantity': data['quantity'],
        'status': 'pending'
    }
    orders[order_id] = order
    return jsonify(order), 201, {'Location': f'/orders/{order_id}'}

# READ ONE — GET /orders/:id
@app.route('/orders/<order_id>', methods=['GET'])
def get_order(order_id):
    order = orders.get(order_id)
    if not order:
        return jsonify({'error': 'not found'}), 404
    return jsonify(order)

# READ MANY — GET /orders?status=pending&limit=10
@app.route('/orders', methods=['GET'])
def list_orders():
    status = request.args.get('status')
    limit = int(request.args.get('limit', 20))
    result = list(orders.values())
    if status:
        result = [o for o in result if o['status'] == status]
    return jsonify({'data': result[:limit], 'total': len(result)})

# UPDATE — PUT /orders/:id
@app.route('/orders/<order_id>', methods=['PUT'])
def update_order(order_id):
    if order_id not in orders:
        return jsonify({'error': 'not found'}), 404
    orders[order_id].update(request.json)
    return jsonify(orders[order_id])

# DELETE — DELETE /orders/:id
@app.route('/orders/<order_id>', methods=['DELETE'])
def delete_order(order_id):
    orders.pop(order_id, None)
    return '', 204
REST is not a standard — it's a style. There is no REST RFC that says "thou shalt use POST for create." REST was described by Roy Fielding in his 2000 PhD dissertation as a set of architectural constraints. What most people call "REST APIs" are really "HTTP APIs that use JSON and follow resource-oriented URL conventions." And that's fine — the conventions are useful regardless of how strictly you follow Fielding's original constraints.

Filtering, Sorting, and Sparse Fieldsets

Real APIs need more than just CRUD. Clients need to filter (show only pending orders), sort (newest first), and select fields (give me only name and email, not the full 50-field profile).

rest
# Filtering — use query parameters
GET /orders?status=pending&min_total=100

# Sorting — prefix with - for descending
GET /orders?sort=-created_at          # newest first
GET /orders?sort=total,-created_at     # cheapest first, then newest

# Sparse fieldsets — request only the fields you need
GET /users/42?fields=name,email         # don't send avatar, preferences, etc.

This reduces payload size (less bandwidth) and database load (fewer columns to fetch). The fields parameter is especially valuable for mobile clients on slow networks.

Good API Design Patterns

PatternExampleWhy
Plural nouns/users, /ordersConsistent. /user/42 vs /users/42 — pick one (plural wins)
Kebab-case/order-items (not /orderItems)URIs are case-insensitive by convention
Filter via query params/orders?status=shippedKeeps the base URL clean
Return created resourcePOST returns full object + idSaves a follow-up GET
Consistent error format{"error": "not_found", "message": "..."}Clients can parse errors programmatically
Concept check: You need an endpoint to cancel an order. Which REST-style approach is most appropriate?

Chapter 4: Idempotency

Here is the single most important concept in service communication reliability. More important than load balancing. More important than circuit breaking. It's the concept that prevents customers from being charged twice, orders from being created twice, and emails from being sent twice.

Idempotency means: performing an operation multiple times has the same effect as performing it once.

Why It Matters: The Retry Problem

Recall from Chapter 0: the Orders service sends a payment request, the Payments service charges the card, but the response gets lost due to a network timeout. The Orders service doesn't know if the payment succeeded or failed. It must retry — the customer is waiting.

If the payment endpoint is not idempotent, the retry creates a second charge. The customer pays $49.99 twice. Your support team gets an angry email. Your company gets a chargeback.

If the payment endpoint is idempotent, the retry detects "I already processed this payment" and returns the original result without charging again. The customer pays once. Everyone is happy.

The iron rule of distributed systems. The network WILL fail. Requests WILL need to be retried. Therefore, any operation that has side effects (charges money, creates records, sends emails) MUST be idempotent. This is not optional. This is not "nice to have." It's a hard requirement for any production system.

Which HTTP Methods Are Idempotent?

MethodIdempotent?Why
GETYesReading data doesn't change it. GET /orders/42 returns the same order every time.
PUTYes"Set X to Y." Setting a name to "Alice" 10 times still results in "Alice."
DELETEYes"Remove X." Removing something that's already gone is a no-op. (Return 204 or 404.)
POSTNo!"Create X." Calling POST /orders twice creates TWO orders. This is the danger zone.
PATCHDepends"Set name to Alice" is idempotent. "Increment balance by $10" is NOT.

Idempotency Keys: Making POST Idempotent

Since POST is inherently non-idempotent, we need to make it idempotent. The standard technique: idempotency keys.

The client generates a unique key for each logical operation and sends it as a header:

http
POST /v1/payments HTTP/1.1
Idempotency-Key: pay_ord42_attempt1
Content-Type: application/json

{"amount": 4999, "currency": "usd"}

The server's logic:

Receive Request
Extract idempotency key from header.
Check Key Store
Look up the key in a persistent store (Redis, database). Has this key been seen before?
↓ key exists
Return Cached Response
Return the exact same response (status code + body) from the first time. Do NOT execute the operation again.
↑ ↓ key is new
Execute Operation
Process the payment. Store the key + response in the key store. Return the response.

Here is the implementation:

python
import redis
import json

r = redis.Redis()

def process_payment(request):
    idem_key = request.headers.get('Idempotency-Key')
    if not idem_key:
        return {'error': 'Idempotency-Key header required'}, 400

    # Check if we've seen this key before
    cached = r.get(f'idem:{idem_key}')
    if cached:
        # Already processed — return the cached response
        return json.loads(cached)

    # First time seeing this key — process the payment
    result = charge_credit_card(
        amount=request.json['amount'],
        currency=request.json['currency']
    )

    # Cache the result with a TTL (24 hours)
    r.setex(
        f'idem:{idem_key}',
        86400,  # 24 hours in seconds
        json.dumps(result)
    )
    return result
The race condition. What if two identical requests arrive at the same microsecond, BEFORE either has stored the result? Both check Redis, both get a miss, both charge the card. The fix: use Redis SET NX (set-if-not-exists) to acquire a lock on the idempotency key before processing. The second request sees the lock and waits (or returns 409 Conflict).

Watch Idempotency in Action

The simulation below shows two scenarios side by side. On the left: a payment endpoint WITHOUT idempotency — retries cause double charges. On the right: the same endpoint WITH an idempotency key — retries are safe.

Retries: Without vs With Idempotency

Click "Send Payment" then "Retry" to see the difference. The left side has no idempotency protection.

Click "Send Payment" to initiate, then "Retry" to simulate a network failure retry.

Real-World Idempotency: Stripe's Pattern

Stripe's payment API is the gold standard for idempotency. Here's how they do it:

BehaviorImplementation
Key generationClient generates a UUID. Stripe recommends including the resource context: ord_42_pay_abc123
Key storageStored in database with the request params hash + response. Keys expire after 24 hours.
Param mismatchIf you reuse a key with DIFFERENT params, Stripe returns 400: "idempotency key used with different request parameters"
In-flight detectionIf a request with the same key is still processing, Stripe returns 409 Conflict
Response replayOn duplicate key, returns the EXACT same HTTP status + body as the original request
The param-hash check is crucial. Without it, a bug in the client could reuse an idempotency key from a $10 charge for a $10,000 charge, and the server would happily return "already processed" with the $10 response. Always hash the request body alongside the idempotency key.

Database-Level Idempotency: The UPSERT Pattern

For simple cases, you don't even need Redis. The database itself can enforce idempotency using unique constraints and upserts:

sql
-- Create table with unique constraint on the idempotency key
CREATE TABLE payments (
    id          UUID PRIMARY KEY,
    idem_key    VARCHAR(255) UNIQUE NOT NULL,
    amount      INTEGER NOT NULL,
    status      VARCHAR(20) NOT NULL,
    created_at  TIMESTAMP DEFAULT NOW()
);

-- Insert-or-ignore: if idem_key already exists, do nothing
INSERT INTO payments (id, idem_key, amount, status)
VALUES ('uuid-here', 'pay_ord42_1', 4999, 'pending')
ON CONFLICT (idem_key) DO NOTHING;

The database's unique constraint guarantees that two concurrent inserts with the same idem_key will not both succeed — one will hit the conflict and be silently dropped. No Redis, no distributed locks, no race conditions. The database IS the lock.

Concept check: Your API has an endpoint PATCH /accounts/42 that accepts {"action": "add_funds", "amount": 100}. A client sends this request, gets a timeout, and retries with the same body. What happens?

Chapter 5: API Evolution

Your API is live. Clients depend on it. Now you need to change it. Add a field. Remove an endpoint. Change a response format. How do you evolve your API without breaking every client that already integrated with it?

Backward vs Forward Compatibility

We saw these concepts in the context of data encoding (Chapter 0). They apply identically to APIs:

Backward compatible: new servers can handle requests from old clients. An old mobile app (v2.1) calls your new server (v3.0) and everything works. This is the minimum bar — you MUST maintain this.

Forward compatible: old servers can handle requests from new clients. Less common because you control your servers. But relevant during rolling deployments when old and new server versions coexist.

Safe Changes (Backward Compatible)

ChangeWhy It's Safe
Add a new optional field to requestOld clients don't send it; server uses default
Add a new field to responseOld clients ignore unknown fields (if they're well-written)
Add a new endpointOld clients don't call it
Add a new enum value to responseSafe IF clients have a default/fallthrough case

Breaking Changes

ChangeWhy It Breaks
Remove a field from responseOld clients that read this field crash or show blank
Rename a fieldSame as remove + add — old clients see the old name disappear
Change a field's type (string to int)Old clients' parsers fail
Add a new required field to requestOld clients don't send it; their requests now fail with 400
Remove an endpointOld clients get 404

Versioning Strategies

When you must make a breaking change, you version the API. There are three main approaches:

StrategyExampleProsCons
URL path/v1/orders, /v2/ordersObvious, easy to route, easy to testDuplicates routes; clients must update URLs
HeaderAccept: application/vnd.myapi.v2+jsonClean URLs; version is metadataHarder to test (can't paste in browser). Easy to forget.
Query param/orders?version=2Easy to test; doesn't change URL structureLooks messy. Caching is harder (different versions share URL path).
The industry winner: URL path versioning. Stripe uses /v1/. Twilio uses /2010-04-01/ (date-based). GitHub uses /v3/. The header approach is "more correct" by REST purists, but URL versioning wins in practice because it's visible, testable, and unambiguous. Use /v1/ and don't overthink it.

OpenAPI / Swagger

How do you document your API so clients know what to send and what to expect? OpenAPI (formerly Swagger) is a machine-readable specification format — a YAML or JSON file that describes every endpoint, parameter, request body, and response schema.

yaml
openapi: "3.0.0"
paths:
  /orders:
    post:
      summary: Create a new order
      requestBody:
        content:
          application/json:
            schema:
              type: object
              required: [product, quantity]
              properties:
                product:
                  type: string
                quantity:
                  type: integer
                  minimum: 1
      responses:
        '201':
          description: Order created

From this spec, you can auto-generate client libraries (in Python, Go, TypeScript, etc.), documentation websites, and even mock servers for testing. The spec becomes the single source of truth for the API contract between teams.

Watch Schema Evolution

API Schema Evolution: Safe vs Breaking Changes

Apply changes to the API schema and see which ones break old clients. Green = safe, red = breaking.

Apply schema changes to see their compatibility impact.
Concept check: Your v1 API returns {"name": "Alice", "email": "alice@ex.com"}. In v2, you want to split "name" into "first_name" and "last_name". What is the backward-compatible way to do this?

Chapter 6: gRPC & Alternatives

REST over HTTP with JSON is the default. But it's not the only option, and for high-performance microservice communication, it's often not the best option. Let's understand three alternatives and when to use each.

gRPC: The Performance Play

gRPC (Google Remote Procedure Call) is a framework for service-to-service communication that uses Protocol Buffers (Protobuf) for serialization and HTTP/2 for transport. It was created by Google, who use it for virtually all internal communication between their millions of services.

Why is gRPC faster than REST+JSON?

AspectREST + JSONgRPC + Protobuf
SerializationJSON: text-based, ~600 bytes for a user objectProtobuf: binary, ~120 bytes for the same object (5x smaller)
SchemaOptional (OpenAPI). Client must guess or read docs.Required (.proto file). Code-generated clients with type safety.
TransportUsually HTTP/1.1 (one request per connection)Always HTTP/2 (multiplexed streams)
StreamingNot native (workarounds: SSE, chunked transfer)Built-in: unary, server-streaming, client-streaming, bidirectional
Browser supportNative (fetch, XMLHttpRequest)Requires grpc-web proxy (no native browser support)

A Protobuf schema defines the service contract:

protobuf
syntax = "proto3";

service PaymentService {
  rpc ChargeCard(ChargeRequest) returns (ChargeResponse);
  rpc StreamTransactions(AccountId) returns (stream Transaction);
}

message ChargeRequest {
  string customer_id = 1;
  int64  amount_cents = 2;
  string currency     = 3;
  string idempotency_key = 4;
}

message ChargeResponse {
  string charge_id = 1;
  string status    = 2;  // "succeeded" | "failed"
}

From this .proto file, the protoc compiler generates client and server stubs in any language — Python, Go, Java, C++. The client calls payment_stub.ChargeCard(request) as if it were a local function. The framework handles serialization, HTTP/2 framing, and error codes.

GraphQL: The Flexible Query

GraphQL (Facebook, 2015) solves a different problem: over-fetching and under-fetching. With REST, the server decides what fields to return. GET /users/42 returns everything — name, email, address, preferences, avatar URL — even if the client only needs the name. That's over-fetching. Conversely, to get a user's orders, you need a second request to GET /users/42/orders. That's under-fetching.

GraphQL lets the client specify exactly what it wants:

graphql
# Client sends this query
{
  user(id: 42) {
    name
    orders {
      id
      total
      status
    }
  }
}

# Server returns exactly this (nothing more, nothing less)
{
  "data": {
    "user": {
      "name": "Alice",
      "orders": [
        {"id": 7, "total": 49.99, "status": "shipped"},
        {"id": 12, "total": 29.99, "status": "delivered"}
      ]
    }
  }
}

One request. No over-fetching. No under-fetching. Especially powerful for mobile clients where bandwidth is limited and round trips are expensive.

gRPC Streaming Modes

One of gRPC's most powerful features is streaming. HTTP/2's multiplexed streams make this natural — no hacks, no polling:

ModeDirectionUse Case
UnaryClient sends 1 request, server sends 1 responseStandard request-response (same as REST)
Server streamingClient sends 1 request, server sends N responsesReal-time price feeds, log tailing, search results
Client streamingClient sends N requests, server sends 1 responseFile uploads, sensor data collection, batch processing
Bidirectional streamingBoth sides send N messages concurrentlyChat, multiplayer games, collaborative editing
protobuf
service StockService {
  // Unary: one price check
  rpc GetPrice(Symbol) returns (Price);

  // Server streaming: subscribe to live price updates
  rpc StreamPrices(WatchList) returns (stream Price);

  // Client streaming: upload historical tick data
  rpc UploadTicks(stream Tick) returns (UploadResult);

  // Bidirectional: trading algorithm sends orders, receives fills
  rpc Trade(stream Order) returns (stream Fill);
}

WebSockets: The Bidirectional Channel

WebSockets solve yet another problem: real-time bidirectional communication. HTTP is request-response: the client asks, the server answers. But what if the server needs to push updates to the client — live chat messages, stock price updates, multiplayer game state?

WebSockets start as an HTTP request (the "upgrade" handshake), then switch to a persistent full-duplex TCP connection. Both sides can send messages at any time, with no request-response overhead.

python
# WebSocket server with Python's websockets library
import asyncio
import websockets
import json

async def chat_handler(websocket):
    async for message in websocket:
        data = json.loads(message)
        # Broadcast to all connected clients
        response = json.dumps({
            "user": data["user"],
            "text": data["text"],
            "timestamp": time.time()
        })
        await websocket.send(response)

# Start server — clients connect via ws://localhost:8765
asyncio.run(websockets.serve(chat_handler, "localhost", 8765))

When to Use What

ProtocolBest ForAvoid When
REST + JSONPublic APIs, CRUD operations, browser-to-server, simple integrationsHigh-throughput internal services, real-time streams
gRPC + ProtobufInternal microservice communication, high throughput, polyglot teamsPublic APIs (browser support is poor), simple webhooks
GraphQLMobile apps, dashboards with many entities, reducing round tripsSimple CRUD, real-time updates, file uploads
WebSocketsChat, live dashboards, multiplayer games, notificationsStandard request-response, stateless operations

Showcase: Three Protocols, Same Data

The simulation below shows the same operation — "get user 42 and their recent orders" — performed via REST, gRPC, and GraphQL. Watch the data volume, number of round trips, and serialization format differ across protocols.

REST vs gRPC vs GraphQL: Same Query, Three Protocols

Click each protocol to see the request flow, data format, and bytes on the wire.

Click a protocol button to visualize the data flow.
Concept check: Your team is building an internal service mesh with 40 microservices, all written in Go and Java. Latency is critical (p99 < 5ms). Which communication protocol should you choose for service-to-service calls?

Chapter 7: API Gateway

You have 40 microservices. External clients — mobile apps, web browsers, third-party integrations — need to talk to them. Do you expose all 40 services directly to the internet? Absolutely not. You put a single entry point in front of them: the API Gateway.

What Does an API Gateway Do?

An API gateway is a reverse proxy that sits between external clients and your internal services. Every external request goes through it. Think of it as the receptionist at a large office building — all visitors check in at the front desk, which directs them to the right department.

The gateway handles cross-cutting concerns — responsibilities that every service needs but shouldn't implement individually:

ConcernWhat It DoesWithout Gateway
AuthenticationValidates JWT tokens, API keys, OAuth tokens. Rejects unauthorized requests before they reach any service.Every service implements its own auth. Inconsistencies and security holes.
Rate limitingLimits requests per client (e.g., 100 req/sec). Protects services from abuse and DDoS.Each service rate-limits independently. Attacker can overload one service by hammering it directly.
RoutingRoutes /users/* to User Service, /orders/* to Order Service. One external URL, many internal services.Clients must know the address of every service. Internal topology leaks to the outside.
Request transformationTranslates external API format to internal format. Adds headers, rewrites paths, aggregates responses.Internal API changes force external clients to change.
Circuit breakingIf a downstream service is failing, stop sending it traffic. Return a cached response or error instead.Cascading failures: one slow service brings down everything upstream.
Logging & metricsCentral place to log every request, measure latency, count errors, trace requests across services.Logs scattered across 40 services. Impossible to correlate.

Circuit Breaking: The Safety Valve

The circuit breaker pattern is critical for preventing cascading failures. It works like an electrical circuit breaker in your house — when too much current flows (too many errors), the breaker trips and stops the flow.

CLOSED (Normal)
Requests flow through. Track error rate. If error rate > threshold (e.g., 50% of last 20 requests)...
↓ threshold exceeded
OPEN (Tripped)
All requests fail immediately with 503. Don't even try calling the downstream service. After a timeout (e.g., 30 seconds)...
↓ timeout expires
HALF-OPEN (Testing)
Allow ONE request through. If it succeeds, go back to CLOSED. If it fails, go back to OPEN.
↻ success → CLOSED, failure → OPEN
Why not just retry forever? If the Payments service is down and the gateway keeps retrying, it queues up thousands of requests. When Payments comes back, it gets slammed with the backlog and crashes again. The circuit breaker gives the failing service time to recover by shedding load. It's the difference between trying to restart a flooded engine (retry) and waiting for the water to drain first (circuit break).

Popular API Gateways

GatewayTypeKey Features
KongOpen sourcePlugin ecosystem, Lua-based, runs on Nginx
AWS API GatewayManaged (cloud)Tight AWS integration, Lambda triggers, usage plans
EnvoyOpen source (L7 proxy)Service mesh sidecar, gRPC-native, observability
NginxOpen sourceBattle-tested reverse proxy, simple config, fast
TraefikOpen sourceAuto-discovery (Kubernetes, Docker), Let's Encrypt

API Gateway in Practice: Nginx Config

What does a real API gateway configuration look like? Here's a simplified Nginx config that routes, rate-limits, and adds security headers:

nginx
# Rate limiting: 10 requests/second per client IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

upstream users_service {
    server users-1:8080;
    server users-2:8080;  # Load balance across instances
}

upstream orders_service {
    server orders-1:8080;
    server orders-2:8080;
}

server {
    listen 443 ssl;
    server_name api.myapp.com;

    # TLS termination at the gateway
    ssl_certificate     /etc/ssl/api.crt;
    ssl_certificate_key /etc/ssl/api.key;

    # Security headers
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options DENY;

    # Route /users/* to Users service
    location /v1/users {
        limit_req zone=api burst=20;
        proxy_pass http://users_service;
        proxy_set_header X-Request-ID $request_id;
    }

    # Route /orders/* to Orders service
    location /v1/orders {
        limit_req zone=api burst=20;
        proxy_pass http://orders_service;
        proxy_set_header X-Request-ID $request_id;
        proxy_read_timeout 10s;  # Fail fast if service hangs
    }
}

Notice several important patterns: TLS terminates at the gateway (internal traffic is unencrypted for speed). Rate limiting uses a shared zone across all gateway instances. Each request gets a unique X-Request-ID for tracing. And upstream blocks define the actual service instances — Nginx round-robins between them.

The BFF Pattern (Backend for Frontend). Some teams create separate API gateways for different clients. A mobile app needs different data than a web dashboard — different fields, different pagination, different rate limits. Instead of one gateway, you have a "Mobile BFF" and a "Web BFF," each tailored to its client. Both call the same internal microservices, but they shape the responses differently. Netflix pioneered this pattern.

Watch Requests Flow Through an API Gateway

The simulation shows external requests entering through the API gateway. Watch authentication, rate limiting, routing, and circuit breaking in action. Try sending many requests quickly to trigger the rate limiter, or "break" a service to see the circuit breaker trip.

API Gateway: Authentication, Routing, Circuit Breaking

Send requests and watch them flow through the gateway's pipeline. Break a service to trigger the circuit breaker.

API Gateway ready. Send requests to see the processing pipeline.
Concept check: The Payments service has been returning 500 errors for 15 seconds. Your circuit breaker threshold is 50% errors over the last 20 requests, and the open-state timeout is 30 seconds. A new payment request arrives. What happens?

Chapter 8: Interview Arsenal

Service communication is one of the most frequently tested topics in system design interviews. Here is your cheat sheet, followed by coding drills and design patterns you should be able to whiteboard in under 5 minutes.

Concept Cheat Sheet

TopicOne-LinerKey Numbers
DNS resolutionHierarchical name-to-IP lookup with caching at every levelTTL: 60-300s typical. 13 root server clusters worldwide.
HTTP methodsGET (read), POST (create), PUT (replace), PATCH (update), DELETE (remove)GET/PUT/DELETE are idempotent. POST is not.
Status codes2xx success, 3xx redirect, 4xx client error, 5xx server errorRetry on 5xx and 429. Never retry 4xx (except 429).
REST designResources (nouns) + HTTP methods (verbs) + JSONPlural nouns, 2 nesting levels max, cursor pagination.
IdempotencySame request N times = same effect as once. Use idempotency keys for POST.Store key in Redis with 24h TTL. Use SET NX for locking.
API versioningURL path (/v1/, /v2/) is the industry standardAlways backward compatible. Add optional fields, never remove.
gRPCProtobuf (binary, 5x smaller) + HTTP/2 (multiplexed). Great for internal services.~10x faster serialization than JSON. No browser support.
GraphQLClient specifies exact fields needed. One endpoint, no over-fetching.N+1 query problem. Caching is hard (no URL-based cache keys).
Circuit breakerCLOSED → OPEN (on errors) → HALF-OPEN (test one) → CLOSEDThreshold: 50% errors over sliding window. Timeout: 10-60s.
API GatewaySingle entry point: auth, rate limiting, routing, circuit breakingKong, Envoy, AWS API GW, Nginx, Traefik.

Design Pattern: "Design a REST API for X"

This is one of the most common interview questions. Here's the framework:

1. Identify Resources
What are the nouns? Users, orders, products, reviews? List them. Each becomes a top-level URI.
2. Define Relationships
Which resources belong to others? Users have orders. Orders have items. Use nesting: /users/:id/orders.
3. Map CRUD Operations
For each resource: POST (create), GET (read one/many), PUT/PATCH (update), DELETE. Which does the API need?
4. Handle Edge Cases
Pagination (cursor-based). Filtering (?status=active). Sorting (?sort=-created_at). Partial responses (?fields=name,email).
5. Error Handling & Idempotency
Consistent error format. Idempotency keys for mutations. Rate limiting. Versioning (/v1/).

Coding Drill: Implement a Rate Limiter

This is a classic coding question. Here's the sliding window counter approach using Redis:

python
import time
import redis

r = redis.Redis()

def is_rate_limited(client_id, max_requests=100, window_sec=60):
    """Sliding window rate limiter.
    Returns True if the client has exceeded max_requests in the last window_sec seconds.
    """
    now = time.time()
    key = f"rate:{client_id}"

    pipe = r.pipeline()
    # Remove entries older than the window
    pipe.zremrangebyscore(key, 0, now - window_sec)
    # Count remaining entries
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Set expiry on the key itself
    pipe.expire(key, window_sec)

    results = pipe.execute()
    request_count = results[1]  # zcard result

    return request_count >= max_requests

How it works: each request is stored as a member in a Redis sorted set, with the timestamp as both the member and the score. Before counting, we remove all entries older than the window. If the remaining count exceeds the limit, the client is rate-limited. The sorted set auto-expires if the client stops sending requests.

Coding Drill: Implement a Circuit Breaker

python
import time
from enum import Enum

class State(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=30):
        self.state = State.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure_time = 0

    def call(self, func, *args):
        if self.state == State.OPEN:
            if time.time() - self.last_failure_time > self.reset_timeout:
                self.state = State.HALF_OPEN  # Try one request
            else:
                raise Exception("Circuit is OPEN — failing fast")

        try:
            result = func(*args)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = State.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = State.OPEN

# Usage
cb = CircuitBreaker(failure_threshold=5, reset_timeout=30)
try:
    result = cb.call(requests.post, "http://payments/charge", json=data)
except Exception:
    # Either the call failed or the circuit is open
    return {"error": "payment service unavailable"}, 503

Coding Drill: Retry with Timeout Budget

In production, you don't just retry N times — you retry within a time budget. If the overall operation must complete in 5 seconds, your retry loop must respect that deadline:

python
import time
import random
import requests

def call_with_budget(url, data, budget_sec=5.0, idem_key=None):
    """Call a service with a retry budget. Retries with backoff until
    the budget is exhausted. Uses idempotency key for safe retries."""
    deadline = time.time() + budget_sec
    attempt = 0
    headers = {'Content-Type': 'application/json'}
    if idem_key:
        headers['Idempotency-Key'] = idem_key

    while time.time() < deadline:
        remaining = deadline - time.time()
        if remaining <= 0:
            break

        try:
            # Per-request timeout = remaining budget
            resp = requests.post(
                url, json=data, headers=headers,
                timeout=min(remaining, 2.0)  # 2s max per attempt
            )
            if resp.status_code < 500:
                return resp  # Success or client error — don't retry
        except requests.Timeout:
            pass  # Will retry
        except requests.ConnectionError:
            pass  # Will retry

        # Exponential backoff with jitter
        delay = min(2 ** attempt + random.random(), remaining)
        time.sleep(delay)
        attempt += 1

    raise TimeoutError(f"Budget of {budget_sec}s exhausted after {attempt} attempts")

Key insight: each individual request has a timeout (min(remaining, 2.0)), AND the overall retry loop has a budget. The budget prevents a cascade where a slow service causes its callers to accumulate waiting threads, which causes their callers to accumulate, and so on until the entire system grinds to a halt.

Quick-Fire Interview Questions

Q: "What's the difference between PUT and PATCH?"
PUT replaces the entire resource. If you PUT a user with only {name: "Alice"}, all other fields (email, address) are erased. PATCH modifies only the specified fields — the rest stay unchanged. PUT is idempotent by definition. PATCH may or may not be.
Q: "How would you make a POST endpoint idempotent?"
Require an Idempotency-Key header. On first request, process and store key + response in Redis (SET NX to prevent races, 24h TTL). On duplicate key, return the stored response without re-processing. Stripe, Square, and most payment APIs use this exact pattern.
Q: "REST vs gRPC — when would you choose each?"
REST for public APIs (browser-friendly, human-readable, widely understood). gRPC for internal microservice communication (binary = faster, generated clients = type-safe, HTTP/2 = multiplexed, streaming = built-in). Many companies use REST externally and gRPC internally.
Q: "What happens when DNS returns a stale IP?"
Traffic goes to the old server. If it's still running, you get the old version's responses. If it's shut down, connections fail. Mitigation: lower TTL before migration, use health checks to detect stale routing, implement graceful shutdown (old server returns 301 redirects during drain period).

Interactive: Design a REST API

Design Drill: REST API for a URL Shortener

Watch the API design process step by step. Each step reveals resources, endpoints, and edge cases.

Click "Next Step" to walk through designing a URL shortener API.
Concept check: You're designing a REST API for a social media platform. A user wants to "like" a post. Which approach is most RESTful and most idempotent-safe?

Chapter 9: Connections

Service communication is the nervous system of distributed systems. Every other topic in this series depends on it. Here's how this lesson connects to everything else.

What We Covered

ChapterCore ConceptWhy It Matters
0The communication explosionMicroservices trade simple function calls for complex network communication
1DNS & service discoveryServices must find each other by name, not hardcoded IP
2HTTP request lifecycleThe universal protocol: methods, status codes, headers, keep-alive
3REST API designResources + HTTP verbs = predictable, cacheable, evolvable APIs
4IdempotencyThe network will fail. Retries must be safe. Idempotency keys are mandatory.
5API evolutionAdd optional fields, never remove. Version with /v1/. OpenAPI for contracts.
6gRPC, GraphQL, WebSocketsDifferent tools for different problems: performance, flexibility, real-time
7API gateway & circuit breakingSingle entry point for auth, rate limiting, routing. Circuit breakers prevent cascades.
8Interview patternsRate limiters, circuit breakers, REST design drills

Where to Go Next

TopicConnection
Encoding & EvolutionChapter 5 (API evolution) is the API-level version of schema evolution. Encoding formats (Protobuf, Avro) determine how data survives version changes on the wire.
ReplicationWhen you replicate data across nodes, the replication protocol IS a form of service communication — with all the same problems (ordering, idempotency, failure detection).
Consistency & ConsensusDistributed consensus protocols (Raft, Paxos) are just very careful, very specific forms of service communication where the "API contract" is mathematically proven.
Load BalancingThe API gateway's routing is one form of load balancing. Understanding L4 vs L7 load balancing, consistent hashing, and health checks deepens everything in Chapter 7.
Message QueuesWhen synchronous HTTP communication isn't enough (fire-and-forget, event-driven architectures), you move to asynchronous messaging. Same idempotency problems, different transport.

Limitations of What We Covered

LimitationWhat We Didn't CoverWhen It Matters
Synchronous onlyAsynchronous messaging (Kafka, RabbitMQ, SQS)Event-driven architectures, decoupled services, eventual consistency
Request-responseEvent sourcing, CQRSHigh-write systems, audit logs, temporal queries
Point-to-pointService mesh (Istio, Linkerd)Managing communication policies (mTLS, retries, observability) across 100+ services
Single-regionCross-region communication, geo-routingGlobal services with users in multiple continents
Trusted networkmTLS (mutual TLS), zero-trust networkingSecurity-sensitive environments where internal traffic must also be encrypted
The one thing to remember. The network is unreliable. Every design decision in service communication — DNS TTLs, idempotency keys, circuit breakers, API versioning — is a consequence of this single, brutal fact. Design for failure, and your system will handle success just fine.

The Bigger Picture

Service communication is layer 1 of the distributed systems stack. Everything above it — replication, consensus, transactions, consistency models — is built on top of the primitives we learned here. A replicated database is just services sending carefully ordered messages to each other. A distributed lock is just an API call with strong idempotency guarantees. A message queue is just an HTTP POST with persistence.

When you understand DNS, HTTP, REST, idempotency, and circuit breaking at the level of this lesson, you have the vocabulary to reason about any distributed system. The rest is just applying these primitives in more sophisticated combinations.

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport