Service Communication — From Absolute Zero to Mastery

Chapter 0: The Problem

You have an e-commerce site. It started as one monolith — one big application that handles users, products, orders, payments, and shipping. Everything talks to everything through function calls inside the same process. Life is simple.

Then your site grows. The monolith becomes a nightmare: a change in the payment code breaks the product search, deploys take 45 minutes, and the team of 60 engineers is constantly stepping on each other's toes. So you do what every fast-growing company does: you break the monolith into microservices.

Now you have seven separate services: Users, Products, Orders, Payments, Shipping, Inventory, and Notifications. Each one is deployed independently, scales independently, and is owned by a different team. Beautiful.

Except now you have a new problem — the hardest problem in distributed systems.

The fundamental question. How does the Orders service find the Payments service? How does it talk to it? What happens when the Payments service is temporarily down? What happens when the request succeeds but the response gets lost — does the customer get charged twice? These are the problems of service communication, and getting them wrong costs real money.

Before microservices, a function call was instant, reliable, and type-checked by the compiler. Now that same "call" crosses a network: it can be slow (10ms to 500ms), it can fail (packet loss, timeout, crash), it can partially succeed (the payment was charged but the response never arrived), and it has no compiler checking the contract between caller and callee.

This lesson covers the entire stack of service communication: how services find each other (DNS, service discovery), how they talk (HTTP, REST, gRPC), how they agree on contracts (API design, versioning), and how they handle the brutal reality that networks are unreliable (idempotency, retries, circuit breaking).

Watch the Problem

The simulation below shows a monolith being split into microservices. Watch what happens when services need to communicate — requests cross the network, and the network is not your friend.

Monolith to Microservices: The Communication Explosion

Click "Place Order" to trace a request through the system. Notice how many network hops are needed.

7 microservices ready. Click "Place Order" to trace request flow.

A single "Place Order" operation that was one function call in the monolith now requires six network round-trips: Orders calls Inventory (is the item in stock?), then Payments (charge the card), then Shipping (schedule delivery), then Notifications (send confirmation), and each of those might call Users (get the address). Every arrow is a potential point of failure.

Three Problems We Must Solve

Problem	What Goes Wrong	Solution
Discovery	Orders doesn't know where Payments lives. IP addresses change when services restart.	DNS, service registries
Communication	How do you encode requests and responses? What format? What protocol?	HTTP, REST, gRPC, GraphQL
Reliability	The network loses packets. Services crash. Retries cause duplicate charges.	Idempotency, circuit breaking, timeouts

We will solve each one, from the ground up.

Concept check: In a microservices architecture, the Orders service sends a payment request to the Payments service. The Payments service charges the credit card successfully but the response is lost due to a network error. The Orders service times out and retries. What is the most dangerous consequence?

The order fails and the customer sees an error page The payment service crashes from too many retries The customer gets charged twice because the retry creates a second payment

Chapter 1: DNS & Service Discovery

Before two services can talk, they need to answer one question: where is the other service? In your apartment, you find your roommate by shouting their name. On the internet, you find a service by resolving a name to an IP address. That system is called DNS — the Domain Name System.

Why Not Just Hardcode IP Addresses?

The naive approach: write the Payments service's IP address directly in the Orders service's config file. PAYMENTS_HOST=10.0.3.42. This works for about a week.

Then the Payments service gets redeployed to a different machine and gets a new IP. Or it scales to three instances and now there are three IPs. Or the data center migrates and every IP changes. Hardcoded IPs are the distributed systems equivalent of hardcoded pixel positions in a Canvas — they work until the screen size changes.

The solution is indirection: give services names, and have a system that maps names to current IP addresses. That system is DNS.

How DNS Works: A Hierarchical Phonebook

DNS is a distributed, hierarchical database. When your browser wants to visit api.stripe.com, it doesn't ask one central server. It walks down a tree:

Your Machine

Checks local cache. "Do I already know api.stripe.com?" If yes, done. If no, ask the recursive resolver.

↓

Recursive Resolver

Your ISP's DNS server (or 8.8.8.8). Checks its cache. If miss, starts the hierarchical walk.

↓

Root Server (.)

"I don't know api.stripe.com, but here's who handles .com domains." Returns the .com TLD servers.

↓

TLD Server (.com)

"I don't know api.stripe.com, but here's who handles stripe.com." Returns Stripe's authoritative nameservers.

↓

Authoritative Server (stripe.com)

"api.stripe.com? That's 54.187.174.169. Here you go. Cache this for 300 seconds."

The answer flows back up the chain. Every intermediate server caches the result. The TTL (Time To Live) — specified by the authoritative server — controls how long caches hold the answer. A TTL of 300 seconds means "this answer is valid for 5 minutes; after that, re-ask."

The TTL tradeoff. Short TTL (30s): changes propagate fast, but DNS servers get hammered with queries. Long TTL (3600s = 1 hour): less DNS load, but if you change IPs, old clients keep hitting the stale address for up to an hour. Production services typically use 60-300s. During a migration, you lower TTL to 30s first, wait for old caches to expire, then change the IP, then raise TTL back.

DNS Record Types That Matter

Type	Maps	Example	Use Case
A	Name → IPv4	api.stripe.com → 54.187.174.169	Most common. "Where is this service?"
AAAA	Name → IPv6	api.stripe.com → 2600:1f18:...	IPv6 equivalent of A record
CNAME	Name → Name	www.stripe.com → stripe.com	Alias. One extra lookup step.
SRV	Name → Host + Port + Priority	_payments._tcp → pay-1:8080 (pri=10)	Service discovery with port and priority

Service Discovery Beyond DNS

Plain DNS works for external services, but inside a microservices cluster, you need more. You need to know not just where a service is, but which instances are healthy. This is service discovery.

Client-side discovery. Every service queries a service registry (like Consul, etcd, or ZooKeeper) and gets back a list of healthy instances. The client picks one — usually round-robin or random. Netflix's Eureka popularized this pattern.

Pros: no extra network hop. Clients can make smart choices (prefer same-zone).

Cons: every client needs the discovery logic. N languages = N implementations.

Server-side discovery. The client sends requests to a load balancer (like AWS ALB, Kubernetes Service, or Envoy). The load balancer queries the registry and forwards the request. The client only needs to know one address: the load balancer's.

Pros: clients are dumb and simple. One discovery implementation.

Cons: extra network hop through the load balancer. LB is a single point of failure (mitigated with redundancy).

In Kubernetes, the dominant pattern today, both happen simultaneously. Kubernetes provides built-in DNS (CoreDNS) that maps service names to ClusterIP virtual addresses. payments.default.svc.cluster.local resolves to a virtual IP. The kernel's iptables or eBPF rules load-balance across healthy pods. From the client's perspective, it's just a DNS name.

Think of it this way. DNS is a phonebook: "What's the phone number for Payments?" Service discovery is a smarter phonebook that also checks: "Is Payments answering the phone? Which of its three lines is least busy? Is line #2 in the same building as you?"

Health Checks: How the Registry Knows Who's Alive

A service registry is only useful if it knows which instances are actually healthy. A server that's running but stuck in an infinite loop, or connected but returning 500 errors, is worse than a server that's down — at least a down server fails fast.

There are two kinds of health checks:

Liveness probes: "Is the process alive?" A simple TCP connection check or an HTTP GET to /healthz that returns 200. If this fails, the instance is removed from the registry (or restarted in Kubernetes). Think of it as checking if someone has a pulse.

Readiness probes: "Can the process handle requests?" A deeper check: is the database connection alive? Is the cache warm? Are all required downstream services reachable? If this fails, traffic stops routing to this instance, but it isn't killed — it might be starting up or recovering. Think of it as checking if someone is awake and ready to work.

python
# Kubernetes-style health check endpoints
@app.route('/healthz')  # Liveness: am I alive?
def liveness():
    return 'ok', 200

@app.route('/readyz')   # Readiness: can I serve traffic?
def readiness():
    if not db.is_connected():
        return 'db not ready', 503
    if not cache.is_warm():
        return 'cache cold', 503
    return 'ok', 200

In Kubernetes, the kubelet runs these probes every N seconds (configurable). Failed liveness probes trigger a pod restart. Failed readiness probes remove the pod from the Service's endpoint list — it stops receiving traffic but keeps running, giving it time to recover.

Kubernetes DNS: The Real-World Pattern

In a Kubernetes cluster, every Service gets a DNS name automatically via CoreDNS:

text
# Full DNS name format:
<service-name>.<namespace>.svc.cluster.local

# Examples:
payments.default.svc.cluster.local     # Payments service in default namespace
orders.production.svc.cluster.local    # Orders service in production namespace

# Within the same namespace, just use the service name:
http://payments:8080/v1/charge          # Kubernetes resolves "payments" automatically

This is why you see PAYMENTS_HOST=payments in Kubernetes config files — not an IP address, just a name. CoreDNS handles the rest, and the Service object handles load balancing across healthy pods. No service registry library needed. No Consul. No Eureka. Just DNS.

Watch DNS Resolution

The simulation below shows a DNS query walking down the hierarchy. Watch the caching at each level and notice how a second query for the same domain is instant.

DNS Resolution: Walking the Hierarchy

Click "Resolve" to query a domain. Click again to see the cache hit. Try different domains.

Click a resolve button to start DNS resolution.

Concept check: You're migrating the Payments service from IP 10.0.3.42 to 10.0.5.99. The DNS TTL for payments.internal is currently 3600 seconds (1 hour). What should you do BEFORE changing the DNS record?

Just change the A record — DNS propagation handles the rest Lower the TTL to 30 seconds, wait at least 1 hour for old caches to expire, THEN change the IP Delete the old DNS record and create a new one with the new IP

Chapter 2: HTTP Deep Dive

Now that services can find each other, they need a language to talk. The dominant language of the web — and of most microservices — is HTTP (HyperText Transfer Protocol). It's a simple request-response protocol: the client sends a request, the server sends back a response. That's it. No magic.

Anatomy of an HTTP Request

Every HTTP request has exactly four parts:

http
POST /v1/payments HTTP/1.1           # 1. Method + Path + Version
Host: payments.internal:8080          # 2. Headers (key-value metadata)
Content-Type: application/json
Authorization: Bearer eyJhbGciOiJS...
Idempotency-Key: ord_abc123_pay_1
                                         # 3. Blank line (separates headers from body)
{                                        # 4. Body (optional, the actual data)
  "amount": 4999,
  "currency": "usd",
  "customer": "cus_abc123"
}

Method tells the server what you want to do. Path identifies the resource. Headers carry metadata — who you are, what format the body is in, caching directives. Body carries the payload, if any.

HTTP Methods: The Verbs

HTTP defines a small set of methods (also called verbs). Each has a specific semantic meaning:

Method	Purpose	Has Body?	Idempotent?	Safe?
GET	Retrieve a resource	No	Yes	Yes
POST	Create a new resource / trigger action	Yes	No	No
PUT	Replace a resource entirely	Yes	Yes	No
PATCH	Partially update a resource	Yes	No*	No
DELETE	Remove a resource	Optional	Yes	No
HEAD	GET but headers only (no body)	No	Yes	Yes
OPTIONS	What methods does this endpoint support?	No	Yes	Yes

Safe means it doesn't modify anything — calling it 100 times has the same effect as calling it 0 times. Idempotent means calling it N times has the same effect as calling it once. We'll dig deep into idempotency in Chapter 4.

*PATCH can be idempotent (e.g., "set name to Alice") but isn't guaranteed to be (e.g., "increment counter by 1").

HTTP Status Codes: The Response Signal

The server responds with a status code — a three-digit number that tells the client what happened. The first digit is the category:

Range	Category	Key Codes
2xx	Success	200 OK, 201 Created, 204 No Content
3xx	Redirect	301 Moved Permanently, 304 Not Modified
4xx	Client Error	400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 429 Too Many Requests
5xx	Server Error	500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout

The retry rule. Should you retry a failed request? 5xx: usually yes — the server had a temporary problem. 429: yes, but respect the Retry-After header. 4xx (except 429): no — you sent a bad request; sending it again won't fix it. Network timeout: maybe — but only if the request is idempotent (Chapter 4). Retrying a non-idempotent POST can cause double-charges.

HTTP/1.1 vs HTTP/2: The Evolution

HTTP/1.1 (1997) has one painful limitation: head-of-line blocking. Each TCP connection handles one request at a time. If you need 6 resources, you either wait for each one sequentially, or open 6 TCP connections (expensive: each needs a new TLS handshake).

HTTP/2 (2015) fixes this with multiplexing: a single TCP connection carries many requests simultaneously using streams. Each request/response pair gets a stream ID. Frames from different streams are interleaved on the wire and reassembled at the other end. One connection, many concurrent requests, no head-of-line blocking at the HTTP level.

HTTP/2 also adds header compression (HPACK) — headers are often repetitive across requests, so compressing them saves significant bandwidth in chatty microservice architectures.

Watch an HTTP Request Lifecycle

HTTP Request Lifecycle: From Client to Server and Back

Watch a request traverse DNS, TCP, TLS, and HTTP layers. Toggle HTTP/2 to see multiplexing.

HTTP/1.1 mode. Click "Send Request" to trace the lifecycle.

Keep-alive. In HTTP/1.1, the Connection: keep-alive header tells the server "don't close the TCP connection after this response — I'll send more requests." This avoids the overhead of re-establishing TCP + TLS for every request. It's the default in HTTP/1.1 and mandatory in HTTP/2.

Headers You Must Know

HTTP headers are key-value pairs that carry metadata about the request or response. Most are boring. These are the ones that matter for service communication:

Header	Direction	Purpose	Example
Content-Type	Both	Format of the body	application/json, application/protobuf
Authorization	Request	Who is calling?	Bearer eyJhbGciOiJS...
Accept	Request	What format do I want back?	application/json
Idempotency-Key	Request	Deduplicate this request	pay_ord42_attempt1
Retry-After	Response	When to retry (with 429/503)	30 (seconds) or a date
X-Request-ID	Both	Trace a request across services	req_abc123def456
Cache-Control	Response	How long can this be cached?	max-age=300, no-cache
ETag	Response	Version fingerprint for caching	"33a64df551425fcc55e"

X-Request-ID is your best friend in production. When a request touches 6 services, and one of them returns a 500 error, how do you find the failing service's logs? If every service propagates the same X-Request-ID through all downstream calls, you can search your logging system for that one ID and see the entire request trace. This is the poor man's distributed tracing. Production systems like Uber's Jaeger and Google's Dapper formalize this into structured trace spans.

Exponential Backoff with Jitter

When you retry a failed request, you should not retry immediately. If the server is overloaded, 10,000 clients retrying simultaneously will make it worse. The standard retry strategy is exponential backoff with jitter:

python
import time
import random

def retry_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise  # Last attempt — give up

            # Base delay: 1s, 2s, 4s, 8s, 16s
            base_delay = 2 ** attempt

            # Add jitter: random between 0 and base_delay
            jitter = random.uniform(0, base_delay)
            delay = base_delay + jitter

            # Cap at 30 seconds
            delay = min(delay, 30)

            print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...")
            time.sleep(delay)

Why jitter? Without it, all clients that failed at time T=0 retry at T=1, fail again, retry at T=2, and so on — they move in lockstep, creating periodic traffic spikes (the "thundering herd" problem). Random jitter spreads the retries across time, smoothing the load.

Concept check: Your service receives an HTTP 503 (Service Unavailable) from a downstream API. Should you retry the request?

No — 5xx errors mean the server is permanently broken Yes, with exponential backoff — 503 means the server is temporarily overloaded and may recover Only if the original request was a GET; never retry POST requests

Chapter 3: REST API Design

HTTP gives you the transport. But how should you design the API on top of it? The dominant style for web APIs is REST (Representational State Transfer) — a set of constraints that, when followed, produce APIs that are predictable, cacheable, and evolvable.

The Core Idea: Resources, Not Actions

REST says: think about resources (nouns), not actions (verbs). A resource is anything that can be named: a user, an order, a payment, a list of products. Each resource has a URI (Uniform Resource Identifier) — its address. HTTP methods provide the verbs.

rest
# BAD: action-oriented (RPC-style)
POST /createUser
POST /getUser
POST /deleteUser
POST /listUsers

# GOOD: resource-oriented (REST-style)
POST   /users          # create a user
GET    /users/42       # get user 42
DELETE /users/42       # delete user 42
GET    /users          # list all users

The REST version is uniform: once you know the pattern for one resource, you know it for all of them. A developer who has never seen your API can guess that GET /products/17 returns product 17 and DELETE /orders/99 cancels order 99.

CRUD Mapping

REST maps the four basic database operations (Create, Read, Update, Delete) to HTTP methods:

Operation	HTTP Method	URI Pattern	Example	Response
Create	POST	/resources	POST /orders	201 Created + Location header
Read one	GET	/resources/:id	GET /orders/42	200 OK + JSON body
Read many	GET	/resources	GET /orders?status=pending	200 OK + JSON array
Replace	PUT	/resources/:id	PUT /orders/42	200 OK
Partial update	PATCH	/resources/:id	PATCH /orders/42	200 OK
Delete	DELETE	/resources/:id	DELETE /orders/42	204 No Content

Nested Resources

When one resource belongs to another, nest the URI:

rest
GET  /users/42/orders           # all orders for user 42
GET  /users/42/orders/7         # order 7 for user 42
POST /users/42/orders           # create an order for user 42

But don't nest too deep. /users/42/orders/7/items/3/reviews is hard to read and hard to cache. Two levels is the sweet spot.

Pagination

Any endpoint that returns a list needs pagination. Without it, GET /products returns your entire catalog in one response — 500MB of JSON. There are two main approaches:

Offset-based: ?offset=20&limit=10. Simple. But if items are inserted while you page, you'll skip or duplicate items. Also, ?offset=1000000 forces the database to scan and skip a million rows.

Cursor-based: ?cursor=eyJpZCI6NDJ9&limit=10. The cursor is an opaque token (often a base64-encoded ID). The server uses it to resume from exactly where it left off. No skipping, no duplication, efficient at any depth. Stripe, Slack, and Twitter all use cursor pagination.

A Complete REST API Example

python
from flask import Flask, request, jsonify
from uuid import uuid4

app = Flask(__name__)
orders = {}  # in-memory store (use a real DB in production)

# CREATE — POST /orders
@app.route('/orders', methods=['POST'])
def create_order():
    data = request.json
    order_id = str(uuid4())
    order = {
        'id': order_id,
        'product': data['product'],
        'quantity': data['quantity'],
        'status': 'pending'
    }
    orders[order_id] = order
    return jsonify(order), 201, {'Location': f'/orders/{order_id}'}

# READ ONE — GET /orders/:id
@app.route('/orders/<order_id>', methods=['GET'])
def get_order(order_id):
    order = orders.get(order_id)
    if not order:
        return jsonify({'error': 'not found'}), 404
    return jsonify(order)

# READ MANY — GET /orders?status=pending&limit=10
@app.route('/orders', methods=['GET'])
def list_orders():
    status = request.args.get('status')
    limit = int(request.args.get('limit', 20))
    result = list(orders.values())
    if status:
        result = [o for o in result if o['status'] == status]
    return jsonify({'data': result[:limit], 'total': len(result)})

# UPDATE — PUT /orders/:id
@app.route('/orders/<order_id>', methods=['PUT'])
def update_order(order_id):
    if order_id not in orders:
        return jsonify({'error': 'not found'}), 404
    orders[order_id].update(request.json)
    return jsonify(orders[order_id])

# DELETE — DELETE /orders/:id
@app.route('/orders/<order_id>', methods=['DELETE'])
def delete_order(order_id):
    orders.pop(order_id, None)
    return '', 204

REST is not a standard — it's a style. There is no REST RFC that says "thou shalt use POST for create." REST was described by Roy Fielding in his 2000 PhD dissertation as a set of architectural constraints. What most people call "REST APIs" are really "HTTP APIs that use JSON and follow resource-oriented URL conventions." And that's fine — the conventions are useful regardless of how strictly you follow Fielding's original constraints.

Filtering, Sorting, and Sparse Fieldsets

Real APIs need more than just CRUD. Clients need to filter (show only pending orders), sort (newest first), and select fields (give me only name and email, not the full 50-field profile).

rest
# Filtering — use query parameters
GET /orders?status=pending&min_total=100

# Sorting — prefix with - for descending
GET /orders?sort=-created_at          # newest first
GET /orders?sort=total,-created_at     # cheapest first, then newest

# Sparse fieldsets — request only the fields you need
GET /users/42?fields=name,email         # don't send avatar, preferences, etc.

This reduces payload size (less bandwidth) and database load (fewer columns to fetch). The fields parameter is especially valuable for mobile clients on slow networks.

Good API Design Patterns

Pattern	Example	Why
Plural nouns	/users, /orders	Consistent. /user/42 vs /users/42 — pick one (plural wins)
Kebab-case	/order-items (not /orderItems)	URIs are case-insensitive by convention
Filter via query params	/orders?status=shipped	Keeps the base URL clean
Return created resource	POST returns full object + id	Saves a follow-up GET
Consistent error format	{"error": "not_found", "message": "..."}	Clients can parse errors programmatically

Concept check: You need an endpoint to cancel an order. Which REST-style approach is most appropriate?

POST /cancelOrder?id=42 PATCH /orders/42 with body {"status": "cancelled"} DELETE /orders/42

Chapter 4: Idempotency

Here is the single most important concept in service communication reliability. More important than load balancing. More important than circuit breaking. It's the concept that prevents customers from being charged twice, orders from being created twice, and emails from being sent twice.

Idempotency means: performing an operation multiple times has the same effect as performing it once.

Why It Matters: The Retry Problem

Recall from Chapter 0: the Orders service sends a payment request, the Payments service charges the card, but the response gets lost due to a network timeout. The Orders service doesn't know if the payment succeeded or failed. It must retry — the customer is waiting.

If the payment endpoint is not idempotent, the retry creates a second charge. The customer pays $49.99 twice. Your support team gets an angry email. Your company gets a chargeback.

If the payment endpoint is idempotent, the retry detects "I already processed this payment" and returns the original result without charging again. The customer pays once. Everyone is happy.

The iron rule of distributed systems. The network WILL fail. Requests WILL need to be retried. Therefore, any operation that has side effects (charges money, creates records, sends emails) MUST be idempotent. This is not optional. This is not "nice to have." It's a hard requirement for any production system.

Which HTTP Methods Are Idempotent?

Method	Idempotent?	Why
GET	Yes	Reading data doesn't change it. GET /orders/42 returns the same order every time.
PUT	Yes	"Set X to Y." Setting a name to "Alice" 10 times still results in "Alice."
DELETE	Yes	"Remove X." Removing something that's already gone is a no-op. (Return 204 or 404.)
POST	No!	"Create X." Calling POST /orders twice creates TWO orders. This is the danger zone.
PATCH	Depends	"Set name to Alice" is idempotent. "Increment balance by $10" is NOT.

Idempotency Keys: Making POST Idempotent

Since POST is inherently non-idempotent, we need to make it idempotent. The standard technique: idempotency keys.

The client generates a unique key for each logical operation and sends it as a header:

http
POST /v1/payments HTTP/1.1
Idempotency-Key: pay_ord42_attempt1
Content-Type: application/json

{"amount": 4999, "currency": "usd"}

The server's logic:

Receive Request

Extract idempotency key from header.

↓

Check Key Store

Look up the key in a persistent store (Redis, database). Has this key been seen before?

↓ key exists

Return Cached Response

Return the exact same response (status code + body) from the first time. Do NOT execute the operation again.

↑ ↓ key is new

Execute Operation

Process the payment. Store the key + response in the key store. Return the response.

Here is the implementation:

python
import redis
import json

r = redis.Redis()

def process_payment(request):
    idem_key = request.headers.get('Idempotency-Key')
    if not idem_key:
        return {'error': 'Idempotency-Key header required'}, 400

    # Check if we've seen this key before
    cached = r.get(f'idem:{idem_key}')
    if cached:
        # Already processed — return the cached response
        return json.loads(cached)

    # First time seeing this key — process the payment
    result = charge_credit_card(
        amount=request.json['amount'],
        currency=request.json['currency']
    )

    # Cache the result with a TTL (24 hours)
    r.setex(
        f'idem:{idem_key}',
        86400,  # 24 hours in seconds
        json.dumps(result)
    )
    return result

The race condition. What if two identical requests arrive at the same microsecond, BEFORE either has stored the result? Both check Redis, both get a miss, both charge the card. The fix: use Redis SET NX (set-if-not-exists) to acquire a lock on the idempotency key before processing. The second request sees the lock and waits (or returns 409 Conflict).

Watch Idempotency in Action

The simulation below shows two scenarios side by side. On the left: a payment endpoint WITHOUT idempotency — retries cause double charges. On the right: the same endpoint WITH an idempotency key — retries are safe.

Retries: Without vs With Idempotency

Click "Send Payment" then "Retry" to see the difference. The left side has no idempotency protection.

Click "Send Payment" to initiate, then "Retry" to simulate a network failure retry.

Real-World Idempotency: Stripe's Pattern

Stripe's payment API is the gold standard for idempotency. Here's how they do it:

Behavior	Implementation
Key generation	Client generates a UUID. Stripe recommends including the resource context: `ord_42_pay_abc123`
Key storage	Stored in database with the request params hash + response. Keys expire after 24 hours.
Param mismatch	If you reuse a key with DIFFERENT params, Stripe returns 400: "idempotency key used with different request parameters"
In-flight detection	If a request with the same key is still processing, Stripe returns 409 Conflict
Response replay	On duplicate key, returns the EXACT same HTTP status + body as the original request

The param-hash check is crucial. Without it, a bug in the client could reuse an idempotency key from a $10 charge for a $10,000 charge, and the server would happily return "already processed" with the $10 response. Always hash the request body alongside the idempotency key.

Database-Level Idempotency: The UPSERT Pattern

For simple cases, you don't even need Redis. The database itself can enforce idempotency using unique constraints and upserts:

sql
-- Create table with unique constraint on the idempotency key
CREATE TABLE payments (
    id          UUID PRIMARY KEY,
    idem_key    VARCHAR(255) UNIQUE NOT NULL,
    amount      INTEGER NOT NULL,
    status      VARCHAR(20) NOT NULL,
    created_at  TIMESTAMP DEFAULT NOW()
);

-- Insert-or-ignore: if idem_key already exists, do nothing
INSERT INTO payments (id, idem_key, amount, status)
VALUES ('uuid-here', 'pay_ord42_1', 4999, 'pending')
ON CONFLICT (idem_key) DO NOTHING;

The database's unique constraint guarantees that two concurrent inserts with the same idem_key will not both succeed — one will hit the conflict and be silently dropped. No Redis, no distributed locks, no race conditions. The database IS the lock.

Concept check: Your API has an endpoint PATCH /accounts/42 that accepts {"action": "add_funds", "amount": 100}. A client sends this request, gets a timeout, and retries with the same body. What happens?

The account gets $200 added (double the intended amount) because "add $100" is not idempotent — it's a relative operation, and without an idempotency key, each retry adds another $100 The account gets exactly $100 because PATCH is always idempotent The server returns 409 Conflict on the retry

Chapter 5: API Evolution

Your API is live. Clients depend on it. Now you need to change it. Add a field. Remove an endpoint. Change a response format. How do you evolve your API without breaking every client that already integrated with it?

Backward vs Forward Compatibility

We saw these concepts in the context of data encoding (Chapter 0). They apply identically to APIs:

Backward compatible: new servers can handle requests from old clients. An old mobile app (v2.1) calls your new server (v3.0) and everything works. This is the minimum bar — you MUST maintain this.

Forward compatible: old servers can handle requests from new clients. Less common because you control your servers. But relevant during rolling deployments when old and new server versions coexist.

Safe Changes (Backward Compatible)

Change	Why It's Safe
Add a new optional field to request	Old clients don't send it; server uses default
Add a new field to response	Old clients ignore unknown fields (if they're well-written)
Add a new endpoint	Old clients don't call it
Add a new enum value to response	Safe IF clients have a default/fallthrough case

Breaking Changes

Change	Why It Breaks
Remove a field from response	Old clients that read this field crash or show blank
Rename a field	Same as remove + add — old clients see the old name disappear
Change a field's type (string to int)	Old clients' parsers fail
Add a new required field to request	Old clients don't send it; their requests now fail with 400
Remove an endpoint	Old clients get 404

Versioning Strategies

When you must make a breaking change, you version the API. There are three main approaches:

Strategy	Example	Pros	Cons
URL path	/v1/orders, /v2/orders	Obvious, easy to route, easy to test	Duplicates routes; clients must update URLs
Header	Accept: application/vnd.myapi.v2+json	Clean URLs; version is metadata	Harder to test (can't paste in browser). Easy to forget.
Query param	/orders?version=2	Easy to test; doesn't change URL structure	Looks messy. Caching is harder (different versions share URL path).

The industry winner: URL path versioning. Stripe uses /v1/. Twilio uses /2010-04-01/ (date-based). GitHub uses /v3/. The header approach is "more correct" by REST purists, but URL versioning wins in practice because it's visible, testable, and unambiguous. Use /v1/ and don't overthink it.

OpenAPI / Swagger

How do you document your API so clients know what to send and what to expect? OpenAPI (formerly Swagger) is a machine-readable specification format — a YAML or JSON file that describes every endpoint, parameter, request body, and response schema.

yaml
openapi: "3.0.0"
paths:
  /orders:
    post:
      summary: Create a new order
      requestBody:
        content:
          application/json:
            schema:
              type: object
              required: [product, quantity]
              properties:
                product:
                  type: string
                quantity:
                  type: integer
                  minimum: 1
      responses:
        '201':
          description: Order created

From this spec, you can auto-generate client libraries (in Python, Go, TypeScript, etc.), documentation websites, and even mock servers for testing. The spec becomes the single source of truth for the API contract between teams.

Watch Schema Evolution

API Schema Evolution: Safe vs Breaking Changes

Apply changes to the API schema and see which ones break old clients. Green = safe, red = breaking.

Apply schema changes to see their compatibility impact.

Concept check: Your v1 API returns {"name": "Alice", "email": "alice@ex.com"}. In v2, you want to split "name" into "first_name" and "last_name". What is the backward-compatible way to do this?

Remove "name" and add "first_name" + "last_name" Rename "name" to "full_name" and add the new fields Keep "name" AND add "first_name" + "last_name" — old clients read "name", new clients read the split fields; deprecate "name" later

Chapter 6: gRPC & Alternatives

REST over HTTP with JSON is the default. But it's not the only option, and for high-performance microservice communication, it's often not the best option. Let's understand three alternatives and when to use each.

gRPC: The Performance Play

gRPC (Google Remote Procedure Call) is a framework for service-to-service communication that uses Protocol Buffers (Protobuf) for serialization and HTTP/2 for transport. It was created by Google, who use it for virtually all internal communication between their millions of services.

Why is gRPC faster than REST+JSON?

Aspect	REST + JSON	gRPC + Protobuf
Serialization	JSON: text-based, ~600 bytes for a user object	Protobuf: binary, ~120 bytes for the same object (5x smaller)
Schema	Optional (OpenAPI). Client must guess or read docs.	Required (.proto file). Code-generated clients with type safety.
Transport	Usually HTTP/1.1 (one request per connection)	Always HTTP/2 (multiplexed streams)
Streaming	Not native (workarounds: SSE, chunked transfer)	Built-in: unary, server-streaming, client-streaming, bidirectional
Browser support	Native (fetch, XMLHttpRequest)	Requires grpc-web proxy (no native browser support)

A Protobuf schema defines the service contract:

protobuf
syntax = "proto3";

service PaymentService {
  rpc ChargeCard(ChargeRequest) returns (ChargeResponse);
  rpc StreamTransactions(AccountId) returns (stream Transaction);
}

message ChargeRequest {
  string customer_id = 1;
  int64  amount_cents = 2;
  string currency     = 3;
  string idempotency_key = 4;
}

message ChargeResponse {
  string charge_id = 1;
  string status    = 2;  // "succeeded" | "failed"
}

From this .proto file, the protoc compiler generates client and server stubs in any language — Python, Go, Java, C++. The client calls payment_stub.ChargeCard(request) as if it were a local function. The framework handles serialization, HTTP/2 framing, and error codes.

GraphQL: The Flexible Query

GraphQL (Facebook, 2015) solves a different problem: over-fetching and under-fetching. With REST, the server decides what fields to return. GET /users/42 returns everything — name, email, address, preferences, avatar URL — even if the client only needs the name. That's over-fetching. Conversely, to get a user's orders, you need a second request to GET /users/42/orders. That's under-fetching.

GraphQL lets the client specify exactly what it wants:

graphql
# Client sends this query
{
  user(id: 42) {
    name
    orders {
      id
      total
      status
    }
  }
}

# Server returns exactly this (nothing more, nothing less)
{
  "data": {
    "user": {
      "name": "Alice",
      "orders": [
        {"id": 7, "total": 49.99, "status": "shipped"},
        {"id": 12, "total": 29.99, "status": "delivered"}
      ]
    }
  }
}

One request. No over-fetching. No under-fetching. Especially powerful for mobile clients where bandwidth is limited and round trips are expensive.

gRPC Streaming Modes

One of gRPC's most powerful features is streaming. HTTP/2's multiplexed streams make this natural — no hacks, no polling:

Mode	Direction	Use Case
Unary	Client sends 1 request, server sends 1 response	Standard request-response (same as REST)
Server streaming	Client sends 1 request, server sends N responses	Real-time price feeds, log tailing, search results
Client streaming	Client sends N requests, server sends 1 response	File uploads, sensor data collection, batch processing
Bidirectional streaming	Both sides send N messages concurrently	Chat, multiplayer games, collaborative editing

protobuf
service StockService {
  // Unary: one price check
  rpc GetPrice(Symbol) returns (Price);

  // Server streaming: subscribe to live price updates
  rpc StreamPrices(WatchList) returns (stream Price);

  // Client streaming: upload historical tick data
  rpc UploadTicks(stream Tick) returns (UploadResult);

  // Bidirectional: trading algorithm sends orders, receives fills
  rpc Trade(stream Order) returns (stream Fill);
}

WebSockets: The Bidirectional Channel

WebSockets solve yet another problem: real-time bidirectional communication. HTTP is request-response: the client asks, the server answers. But what if the server needs to push updates to the client — live chat messages, stock price updates, multiplayer game state?

WebSockets start as an HTTP request (the "upgrade" handshake), then switch to a persistent full-duplex TCP connection. Both sides can send messages at any time, with no request-response overhead.

python
# WebSocket server with Python's websockets library
import asyncio
import websockets
import json

async def chat_handler(websocket):
    async for message in websocket:
        data = json.loads(message)
        # Broadcast to all connected clients
        response = json.dumps({
            "user": data["user"],
            "text": data["text"],
            "timestamp": time.time()
        })
        await websocket.send(response)

# Start server — clients connect via ws://localhost:8765
asyncio.run(websockets.serve(chat_handler, "localhost", 8765))

When to Use What

Protocol	Best For	Avoid When
REST + JSON	Public APIs, CRUD operations, browser-to-server, simple integrations	High-throughput internal services, real-time streams
gRPC + Protobuf	Internal microservice communication, high throughput, polyglot teams	Public APIs (browser support is poor), simple webhooks
GraphQL	Mobile apps, dashboards with many entities, reducing round trips	Simple CRUD, real-time updates, file uploads
WebSockets	Chat, live dashboards, multiplayer games, notifications	Standard request-response, stateless operations

Showcase: Three Protocols, Same Data

The simulation below shows the same operation — "get user 42 and their recent orders" — performed via REST, gRPC, and GraphQL. Watch the data volume, number of round trips, and serialization format differ across protocols.

REST vs gRPC vs GraphQL: Same Query, Three Protocols

Click each protocol to see the request flow, data format, and bytes on the wire.

Click a protocol button to visualize the data flow.

Concept check: Your team is building an internal service mesh with 40 microservices, all written in Go and Java. Latency is critical (p99 < 5ms). Which communication protocol should you choose for service-to-service calls?

REST + JSON — it's the industry standard gRPC + Protobuf — binary serialization is 5-10x faster than JSON, HTTP/2 multiplexing reduces connection overhead, and generated clients ensure type safety across Go and Java GraphQL — it reduces over-fetching

Chapter 7: API Gateway

You have 40 microservices. External clients — mobile apps, web browsers, third-party integrations — need to talk to them. Do you expose all 40 services directly to the internet? Absolutely not. You put a single entry point in front of them: the API Gateway.

What Does an API Gateway Do?

An API gateway is a reverse proxy that sits between external clients and your internal services. Every external request goes through it. Think of it as the receptionist at a large office building — all visitors check in at the front desk, which directs them to the right department.

The gateway handles cross-cutting concerns — responsibilities that every service needs but shouldn't implement individually:

Concern	What It Does	Without Gateway
Authentication	Validates JWT tokens, API keys, OAuth tokens. Rejects unauthorized requests before they reach any service.	Every service implements its own auth. Inconsistencies and security holes.
Rate limiting	Limits requests per client (e.g., 100 req/sec). Protects services from abuse and DDoS.	Each service rate-limits independently. Attacker can overload one service by hammering it directly.
Routing	Routes /users/* to User Service, /orders/* to Order Service. One external URL, many internal services.	Clients must know the address of every service. Internal topology leaks to the outside.
Request transformation	Translates external API format to internal format. Adds headers, rewrites paths, aggregates responses.	Internal API changes force external clients to change.
Circuit breaking	If a downstream service is failing, stop sending it traffic. Return a cached response or error instead.	Cascading failures: one slow service brings down everything upstream.
Logging & metrics	Central place to log every request, measure latency, count errors, trace requests across services.	Logs scattered across 40 services. Impossible to correlate.

Circuit Breaking: The Safety Valve

The circuit breaker pattern is critical for preventing cascading failures. It works like an electrical circuit breaker in your house — when too much current flows (too many errors), the breaker trips and stops the flow.

CLOSED (Normal)

Requests flow through. Track error rate. If error rate > threshold (e.g., 50% of last 20 requests)...

↓ threshold exceeded

OPEN (Tripped)

All requests fail immediately with 503. Don't even try calling the downstream service. After a timeout (e.g., 30 seconds)...

↓ timeout expires

HALF-OPEN (Testing)

Allow ONE request through. If it succeeds, go back to CLOSED. If it fails, go back to OPEN.

↻ success → CLOSED, failure → OPEN

Why not just retry forever? If the Payments service is down and the gateway keeps retrying, it queues up thousands of requests. When Payments comes back, it gets slammed with the backlog and crashes again. The circuit breaker gives the failing service time to recover by shedding load. It's the difference between trying to restart a flooded engine (retry) and waiting for the water to drain first (circuit break).

Popular API Gateways

Gateway	Type	Key Features
Kong	Open source	Plugin ecosystem, Lua-based, runs on Nginx
AWS API Gateway	Managed (cloud)	Tight AWS integration, Lambda triggers, usage plans
Envoy	Open source (L7 proxy)	Service mesh sidecar, gRPC-native, observability
Nginx	Open source	Battle-tested reverse proxy, simple config, fast
Traefik	Open source	Auto-discovery (Kubernetes, Docker), Let's Encrypt

API Gateway in Practice: Nginx Config

What does a real API gateway configuration look like? Here's a simplified Nginx config that routes, rate-limits, and adds security headers:

nginx
# Rate limiting: 10 requests/second per client IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

upstream users_service {
    server users-1:8080;
    server users-2:8080;  # Load balance across instances
}

upstream orders_service {
    server orders-1:8080;
    server orders-2:8080;
}

server {
    listen 443 ssl;
    server_name api.myapp.com;

    # TLS termination at the gateway
    ssl_certificate     /etc/ssl/api.crt;
    ssl_certificate_key /etc/ssl/api.key;

    # Security headers
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options DENY;

    # Route /users/* to Users service
    location /v1/users {
        limit_req zone=api burst=20;
        proxy_pass http://users_service;
        proxy_set_header X-Request-ID $request_id;
    }

    # Route /orders/* to Orders service
    location /v1/orders {
        limit_req zone=api burst=20;
        proxy_pass http://orders_service;
        proxy_set_header X-Request-ID $request_id;
        proxy_read_timeout 10s;  # Fail fast if service hangs
    }
}

Notice several important patterns: TLS terminates at the gateway (internal traffic is unencrypted for speed). Rate limiting uses a shared zone across all gateway instances. Each request gets a unique X-Request-ID for tracing. And upstream blocks define the actual service instances — Nginx round-robins between them.

The BFF Pattern (Backend for Frontend). Some teams create separate API gateways for different clients. A mobile app needs different data than a web dashboard — different fields, different pagination, different rate limits. Instead of one gateway, you have a "Mobile BFF" and a "Web BFF," each tailored to its client. Both call the same internal microservices, but they shape the responses differently. Netflix pioneered this pattern.

Watch Requests Flow Through an API Gateway

The simulation shows external requests entering through the API gateway. Watch authentication, rate limiting, routing, and circuit breaking in action. Try sending many requests quickly to trigger the rate limiter, or "break" a service to see the circuit breaker trip.

API Gateway: Authentication, Routing, Circuit Breaking

Send requests and watch them flow through the gateway's pipeline. Break a service to trigger the circuit breaker.

API Gateway ready. Send requests to see the processing pipeline.

Concept check: The Payments service has been returning 500 errors for 15 seconds. Your circuit breaker threshold is 50% errors over the last 20 requests, and the open-state timeout is 30 seconds. A new payment request arrives. What happens?

The circuit breaker is OPEN, so the gateway immediately returns 503 to the client without forwarding the request to the Payments service, giving Payments time to recover The gateway retries the request 3 times with exponential backoff The gateway routes the request to a different instance of the Payments service

Chapter 8: Interview Arsenal

Service communication is one of the most frequently tested topics in system design interviews. Here is your cheat sheet, followed by coding drills and design patterns you should be able to whiteboard in under 5 minutes.

Concept Cheat Sheet

Topic	One-Liner	Key Numbers
DNS resolution	Hierarchical name-to-IP lookup with caching at every level	TTL: 60-300s typical. 13 root server clusters worldwide.
HTTP methods	GET (read), POST (create), PUT (replace), PATCH (update), DELETE (remove)	GET/PUT/DELETE are idempotent. POST is not.
Status codes	2xx success, 3xx redirect, 4xx client error, 5xx server error	Retry on 5xx and 429. Never retry 4xx (except 429).
REST design	Resources (nouns) + HTTP methods (verbs) + JSON	Plural nouns, 2 nesting levels max, cursor pagination.
Idempotency	Same request N times = same effect as once. Use idempotency keys for POST.	Store key in Redis with 24h TTL. Use SET NX for locking.
API versioning	URL path (/v1/, /v2/) is the industry standard	Always backward compatible. Add optional fields, never remove.
gRPC	Protobuf (binary, 5x smaller) + HTTP/2 (multiplexed). Great for internal services.	~10x faster serialization than JSON. No browser support.
GraphQL	Client specifies exact fields needed. One endpoint, no over-fetching.	N+1 query problem. Caching is hard (no URL-based cache keys).
Circuit breaker	CLOSED → OPEN (on errors) → HALF-OPEN (test one) → CLOSED	Threshold: 50% errors over sliding window. Timeout: 10-60s.
API Gateway	Single entry point: auth, rate limiting, routing, circuit breaking	Kong, Envoy, AWS API GW, Nginx, Traefik.

Design Pattern: "Design a REST API for X"

This is one of the most common interview questions. Here's the framework:

1. Identify Resources

What are the nouns? Users, orders, products, reviews? List them. Each becomes a top-level URI.

↓

2. Define Relationships

Which resources belong to others? Users have orders. Orders have items. Use nesting: /users/:id/orders.

↓

3. Map CRUD Operations

For each resource: POST (create), GET (read one/many), PUT/PATCH (update), DELETE. Which does the API need?

↓

4. Handle Edge Cases

Pagination (cursor-based). Filtering (?status=active). Sorting (?sort=-created_at). Partial responses (?fields=name,email).

↓

5. Error Handling & Idempotency

Consistent error format. Idempotency keys for mutations. Rate limiting. Versioning (/v1/).

Coding Drill: Implement a Rate Limiter

This is a classic coding question. Here's the sliding window counter approach using Redis:

python
import time
import redis

r = redis.Redis()

def is_rate_limited(client_id, max_requests=100, window_sec=60):
    """Sliding window rate limiter.
    Returns True if the client has exceeded max_requests in the last window_sec seconds.
    """
    now = time.time()
    key = f"rate:{client_id}"

    pipe = r.pipeline()
    # Remove entries older than the window
    pipe.zremrangebyscore(key, 0, now - window_sec)
    # Count remaining entries
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Set expiry on the key itself
    pipe.expire(key, window_sec)

    results = pipe.execute()
    request_count = results[1]  # zcard result

    return request_count >= max_requests

How it works: each request is stored as a member in a Redis sorted set, with the timestamp as both the member and the score. Before counting, we remove all entries older than the window. If the remaining count exceeds the limit, the client is rate-limited. The sorted set auto-expires if the client stops sending requests.

Coding Drill: Implement a Circuit Breaker

python
import time
from enum import Enum

class State(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=30):
        self.state = State.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure_time = 0

    def call(self, func, *args):
        if self.state == State.OPEN:
            if time.time() - self.last_failure_time > self.reset_timeout:
                self.state = State.HALF_OPEN  # Try one request
            else:
                raise Exception("Circuit is OPEN — failing fast")

        try:
            result = func(*args)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        self.failure_count = 0
        self.state = State.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = State.OPEN

# Usage
cb = CircuitBreaker(failure_threshold=5, reset_timeout=30)
try:
    result = cb.call(requests.post, "http://payments/charge", json=data)
except Exception:
    # Either the call failed or the circuit is open
    return {"error": "payment service unavailable"}, 503

Coding Drill: Retry with Timeout Budget

In production, you don't just retry N times — you retry within a time budget. If the overall operation must complete in 5 seconds, your retry loop must respect that deadline:

python
import time
import random
import requests

def call_with_budget(url, data, budget_sec=5.0, idem_key=None):
    """Call a service with a retry budget. Retries with backoff until
    the budget is exhausted. Uses idempotency key for safe retries."""
    deadline = time.time() + budget_sec
    attempt = 0
    headers = {'Content-Type': 'application/json'}
    if idem_key:
        headers['Idempotency-Key'] = idem_key

    while time.time() < deadline:
        remaining = deadline - time.time()
        if remaining <= 0:
            break

        try:
            # Per-request timeout = remaining budget
            resp = requests.post(
                url, json=data, headers=headers,
                timeout=min(remaining, 2.0)  # 2s max per attempt
            )
            if resp.status_code < 500:
                return resp  # Success or client error — don't retry
        except requests.Timeout:
            pass  # Will retry
        except requests.ConnectionError:
            pass  # Will retry

        # Exponential backoff with jitter
        delay = min(2 ** attempt + random.random(), remaining)
        time.sleep(delay)
        attempt += 1

    raise TimeoutError(f"Budget of {budget_sec}s exhausted after {attempt} attempts")

Key insight: each individual request has a timeout (min(remaining, 2.0)), AND the overall retry loop has a budget. The budget prevents a cascade where a slow service causes its callers to accumulate waiting threads, which causes their callers to accumulate, and so on until the entire system grinds to a halt.

Quick-Fire Interview Questions

Q: "What's the difference between PUT and PATCH?"
PUT replaces the entire resource. If you PUT a user with only {name: "Alice"}, all other fields (email, address) are erased. PATCH modifies only the specified fields — the rest stay unchanged. PUT is idempotent by definition. PATCH may or may not be.

Q: "How would you make a POST endpoint idempotent?"
Require an Idempotency-Key header. On first request, process and store key + response in Redis (SET NX to prevent races, 24h TTL). On duplicate key, return the stored response without re-processing. Stripe, Square, and most payment APIs use this exact pattern.

Q: "REST vs gRPC — when would you choose each?"
REST for public APIs (browser-friendly, human-readable, widely understood). gRPC for internal microservice communication (binary = faster, generated clients = type-safe, HTTP/2 = multiplexed, streaming = built-in). Many companies use REST externally and gRPC internally.

Q: "What happens when DNS returns a stale IP?"
Traffic goes to the old server. If it's still running, you get the old version's responses. If it's shut down, connections fail. Mitigation: lower TTL before migration, use health checks to detect stale routing, implement graceful shutdown (old server returns 301 redirects during drain period).

Interactive: Design a REST API

Design Drill: REST API for a URL Shortener

Watch the API design process step by step. Each step reveals resources, endpoints, and edge cases.

Click "Next Step" to walk through designing a URL shortener API.

Concept check: You're designing a REST API for a social media platform. A user wants to "like" a post. Which approach is most RESTful and most idempotent-safe?

PUT /posts/42/likes/user_7 — the "like" is modeled as a resource (the user-post pair). PUT is idempotent: liking twice = liking once. POST /posts/42/like — simple but POST is not idempotent; double-tap could create two likes PATCH /posts/42 with {"likes": likes + 1} — increment is not idempotent

Chapter 9: Connections

Service communication is the nervous system of distributed systems. Every other topic in this series depends on it. Here's how this lesson connects to everything else.

What We Covered

Chapter	Core Concept	Why It Matters
0	The communication explosion	Microservices trade simple function calls for complex network communication
1	DNS & service discovery	Services must find each other by name, not hardcoded IP
2	HTTP request lifecycle	The universal protocol: methods, status codes, headers, keep-alive
3	REST API design	Resources + HTTP verbs = predictable, cacheable, evolvable APIs
4	Idempotency	The network will fail. Retries must be safe. Idempotency keys are mandatory.
5	API evolution	Add optional fields, never remove. Version with /v1/. OpenAPI for contracts.
6	gRPC, GraphQL, WebSockets	Different tools for different problems: performance, flexibility, real-time
7	API gateway & circuit breaking	Single entry point for auth, rate limiting, routing. Circuit breakers prevent cascades.
8	Interview patterns	Rate limiters, circuit breakers, REST design drills

Where to Go Next

Topic	Connection
Encoding & Evolution	Chapter 5 (API evolution) is the API-level version of schema evolution. Encoding formats (Protobuf, Avro) determine how data survives version changes on the wire.
Replication	When you replicate data across nodes, the replication protocol IS a form of service communication — with all the same problems (ordering, idempotency, failure detection).
Consistency & Consensus	Distributed consensus protocols (Raft, Paxos) are just very careful, very specific forms of service communication where the "API contract" is mathematically proven.
Load Balancing	The API gateway's routing is one form of load balancing. Understanding L4 vs L7 load balancing, consistent hashing, and health checks deepens everything in Chapter 7.
Message Queues	When synchronous HTTP communication isn't enough (fire-and-forget, event-driven architectures), you move to asynchronous messaging. Same idempotency problems, different transport.

Limitations of What We Covered

Limitation	What We Didn't Cover	When It Matters
Synchronous only	Asynchronous messaging (Kafka, RabbitMQ, SQS)	Event-driven architectures, decoupled services, eventual consistency
Request-response	Event sourcing, CQRS	High-write systems, audit logs, temporal queries
Point-to-point	Service mesh (Istio, Linkerd)	Managing communication policies (mTLS, retries, observability) across 100+ services
Single-region	Cross-region communication, geo-routing	Global services with users in multiple continents
Trusted network	mTLS (mutual TLS), zero-trust networking	Security-sensitive environments where internal traffic must also be encrypted

The one thing to remember. The network is unreliable. Every design decision in service communication — DNS TTLs, idempotency keys, circuit breakers, API versioning — is a consequence of this single, brutal fact. Design for failure, and your system will handle success just fine.

The Bigger Picture

Service communication is layer 1 of the distributed systems stack. Everything above it — replication, consensus, transactions, consistency models — is built on top of the primitives we learned here. A replicated database is just services sending carefully ordered messages to each other. A distributed lock is just an API call with strong idempotency guarantees. A message queue is just an HTTP POST with persistence.

When you understand DNS, HTTP, REST, idempotency, and circuit breaking at the level of this lesson, you have the vocabulary to reason about any distributed system. The rest is just applying these primitives in more sophisticated combinations.

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport