DNS discovery, REST APIs, HTTP, idempotency — how services find and talk to each other.
You have an e-commerce site. It started as one monolith — one big application that handles users, products, orders, payments, and shipping. Everything talks to everything through function calls inside the same process. Life is simple.
Then your site grows. The monolith becomes a nightmare: a change in the payment code breaks the product search, deploys take 45 minutes, and the team of 60 engineers is constantly stepping on each other's toes. So you do what every fast-growing company does: you break the monolith into microservices.
Now you have seven separate services: Users, Products, Orders, Payments, Shipping, Inventory, and Notifications. Each one is deployed independently, scales independently, and is owned by a different team. Beautiful.
Except now you have a new problem — the hardest problem in distributed systems.
Before microservices, a function call was instant, reliable, and type-checked by the compiler. Now that same "call" crosses a network: it can be slow (10ms to 500ms), it can fail (packet loss, timeout, crash), it can partially succeed (the payment was charged but the response never arrived), and it has no compiler checking the contract between caller and callee.
This lesson covers the entire stack of service communication: how services find each other (DNS, service discovery), how they talk (HTTP, REST, gRPC), how they agree on contracts (API design, versioning), and how they handle the brutal reality that networks are unreliable (idempotency, retries, circuit breaking).
The simulation below shows a monolith being split into microservices. Watch what happens when services need to communicate — requests cross the network, and the network is not your friend.
Click "Place Order" to trace a request through the system. Notice how many network hops are needed.
A single "Place Order" operation that was one function call in the monolith now requires six network round-trips: Orders calls Inventory (is the item in stock?), then Payments (charge the card), then Shipping (schedule delivery), then Notifications (send confirmation), and each of those might call Users (get the address). Every arrow is a potential point of failure.
| Problem | What Goes Wrong | Solution |
|---|---|---|
| Discovery | Orders doesn't know where Payments lives. IP addresses change when services restart. | DNS, service registries |
| Communication | How do you encode requests and responses? What format? What protocol? | HTTP, REST, gRPC, GraphQL |
| Reliability | The network loses packets. Services crash. Retries cause duplicate charges. | Idempotency, circuit breaking, timeouts |
We will solve each one, from the ground up.
Before two services can talk, they need to answer one question: where is the other service? In your apartment, you find your roommate by shouting their name. On the internet, you find a service by resolving a name to an IP address. That system is called DNS — the Domain Name System.
The naive approach: write the Payments service's IP address directly in the Orders service's config file. PAYMENTS_HOST=10.0.3.42. This works for about a week.
Then the Payments service gets redeployed to a different machine and gets a new IP. Or it scales to three instances and now there are three IPs. Or the data center migrates and every IP changes. Hardcoded IPs are the distributed systems equivalent of hardcoded pixel positions in a Canvas — they work until the screen size changes.
The solution is indirection: give services names, and have a system that maps names to current IP addresses. That system is DNS.
DNS is a distributed, hierarchical database. When your browser wants to visit api.stripe.com, it doesn't ask one central server. It walks down a tree:
The answer flows back up the chain. Every intermediate server caches the result. The TTL (Time To Live) — specified by the authoritative server — controls how long caches hold the answer. A TTL of 300 seconds means "this answer is valid for 5 minutes; after that, re-ask."
| Type | Maps | Example | Use Case |
|---|---|---|---|
| A | Name → IPv4 | api.stripe.com → 54.187.174.169 | Most common. "Where is this service?" |
| AAAA | Name → IPv6 | api.stripe.com → 2600:1f18:... | IPv6 equivalent of A record |
| CNAME | Name → Name | www.stripe.com → stripe.com | Alias. One extra lookup step. |
| SRV | Name → Host + Port + Priority | _payments._tcp → pay-1:8080 (pri=10) | Service discovery with port and priority |
Plain DNS works for external services, but inside a microservices cluster, you need more. You need to know not just where a service is, but which instances are healthy. This is service discovery.
Client-side discovery. Every service queries a service registry (like Consul, etcd, or ZooKeeper) and gets back a list of healthy instances. The client picks one — usually round-robin or random. Netflix's Eureka popularized this pattern.
Pros: no extra network hop. Clients can make smart choices (prefer same-zone).
Cons: every client needs the discovery logic. N languages = N implementations.
Server-side discovery. The client sends requests to a load balancer (like AWS ALB, Kubernetes Service, or Envoy). The load balancer queries the registry and forwards the request. The client only needs to know one address: the load balancer's.
Pros: clients are dumb and simple. One discovery implementation.
Cons: extra network hop through the load balancer. LB is a single point of failure (mitigated with redundancy).
In Kubernetes, the dominant pattern today, both happen simultaneously. Kubernetes provides built-in DNS (CoreDNS) that maps service names to ClusterIP virtual addresses. payments.default.svc.cluster.local resolves to a virtual IP. The kernel's iptables or eBPF rules load-balance across healthy pods. From the client's perspective, it's just a DNS name.
A service registry is only useful if it knows which instances are actually healthy. A server that's running but stuck in an infinite loop, or connected but returning 500 errors, is worse than a server that's down — at least a down server fails fast.
There are two kinds of health checks:
Liveness probes: "Is the process alive?" A simple TCP connection check or an HTTP GET to /healthz that returns 200. If this fails, the instance is removed from the registry (or restarted in Kubernetes). Think of it as checking if someone has a pulse.
Readiness probes: "Can the process handle requests?" A deeper check: is the database connection alive? Is the cache warm? Are all required downstream services reachable? If this fails, traffic stops routing to this instance, but it isn't killed — it might be starting up or recovering. Think of it as checking if someone is awake and ready to work.
python # Kubernetes-style health check endpoints @app.route('/healthz') # Liveness: am I alive? def liveness(): return 'ok', 200 @app.route('/readyz') # Readiness: can I serve traffic? def readiness(): if not db.is_connected(): return 'db not ready', 503 if not cache.is_warm(): return 'cache cold', 503 return 'ok', 200
In Kubernetes, the kubelet runs these probes every N seconds (configurable). Failed liveness probes trigger a pod restart. Failed readiness probes remove the pod from the Service's endpoint list — it stops receiving traffic but keeps running, giving it time to recover.
In a Kubernetes cluster, every Service gets a DNS name automatically via CoreDNS:
text # Full DNS name format: <service-name>.<namespace>.svc.cluster.local # Examples: payments.default.svc.cluster.local # Payments service in default namespace orders.production.svc.cluster.local # Orders service in production namespace # Within the same namespace, just use the service name: http://payments:8080/v1/charge # Kubernetes resolves "payments" automatically
This is why you see PAYMENTS_HOST=payments in Kubernetes config files — not an IP address, just a name. CoreDNS handles the rest, and the Service object handles load balancing across healthy pods. No service registry library needed. No Consul. No Eureka. Just DNS.
The simulation below shows a DNS query walking down the hierarchy. Watch the caching at each level and notice how a second query for the same domain is instant.
Click "Resolve" to query a domain. Click again to see the cache hit. Try different domains.
Now that services can find each other, they need a language to talk. The dominant language of the web — and of most microservices — is HTTP (HyperText Transfer Protocol). It's a simple request-response protocol: the client sends a request, the server sends back a response. That's it. No magic.
Every HTTP request has exactly four parts:
http POST /v1/payments HTTP/1.1 # 1. Method + Path + Version Host: payments.internal:8080 # 2. Headers (key-value metadata) Content-Type: application/json Authorization: Bearer eyJhbGciOiJS... Idempotency-Key: ord_abc123_pay_1 # 3. Blank line (separates headers from body) { # 4. Body (optional, the actual data) "amount": 4999, "currency": "usd", "customer": "cus_abc123" }
Method tells the server what you want to do. Path identifies the resource. Headers carry metadata — who you are, what format the body is in, caching directives. Body carries the payload, if any.
HTTP defines a small set of methods (also called verbs). Each has a specific semantic meaning:
| Method | Purpose | Has Body? | Idempotent? | Safe? |
|---|---|---|---|---|
| GET | Retrieve a resource | No | Yes | Yes |
| POST | Create a new resource / trigger action | Yes | No | No |
| PUT | Replace a resource entirely | Yes | Yes | No |
| PATCH | Partially update a resource | Yes | No* | No |
| DELETE | Remove a resource | Optional | Yes | No |
| HEAD | GET but headers only (no body) | No | Yes | Yes |
| OPTIONS | What methods does this endpoint support? | No | Yes | Yes |
Safe means it doesn't modify anything — calling it 100 times has the same effect as calling it 0 times. Idempotent means calling it N times has the same effect as calling it once. We'll dig deep into idempotency in Chapter 4.
*PATCH can be idempotent (e.g., "set name to Alice") but isn't guaranteed to be (e.g., "increment counter by 1").
The server responds with a status code — a three-digit number that tells the client what happened. The first digit is the category:
| Range | Category | Key Codes |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 204 No Content |
| 3xx | Redirect | 301 Moved Permanently, 304 Not Modified |
| 4xx | Client Error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 429 Too Many Requests |
| 5xx | Server Error | 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout |
Retry-After header. 4xx (except 429): no — you sent a bad request; sending it again won't fix it. Network timeout: maybe — but only if the request is idempotent (Chapter 4). Retrying a non-idempotent POST can cause double-charges.HTTP/1.1 (1997) has one painful limitation: head-of-line blocking. Each TCP connection handles one request at a time. If you need 6 resources, you either wait for each one sequentially, or open 6 TCP connections (expensive: each needs a new TLS handshake).
HTTP/2 (2015) fixes this with multiplexing: a single TCP connection carries many requests simultaneously using streams. Each request/response pair gets a stream ID. Frames from different streams are interleaved on the wire and reassembled at the other end. One connection, many concurrent requests, no head-of-line blocking at the HTTP level.
HTTP/2 also adds header compression (HPACK) — headers are often repetitive across requests, so compressing them saves significant bandwidth in chatty microservice architectures.
Watch a request traverse DNS, TCP, TLS, and HTTP layers. Toggle HTTP/2 to see multiplexing.
Connection: keep-alive header tells the server "don't close the TCP connection after this response — I'll send more requests." This avoids the overhead of re-establishing TCP + TLS for every request. It's the default in HTTP/1.1 and mandatory in HTTP/2.HTTP headers are key-value pairs that carry metadata about the request or response. Most are boring. These are the ones that matter for service communication:
| Header | Direction | Purpose | Example |
|---|---|---|---|
| Content-Type | Both | Format of the body | application/json, application/protobuf |
| Authorization | Request | Who is calling? | Bearer eyJhbGciOiJS... |
| Accept | Request | What format do I want back? | application/json |
| Idempotency-Key | Request | Deduplicate this request | pay_ord42_attempt1 |
| Retry-After | Response | When to retry (with 429/503) | 30 (seconds) or a date |
| X-Request-ID | Both | Trace a request across services | req_abc123def456 |
| Cache-Control | Response | How long can this be cached? | max-age=300, no-cache |
| ETag | Response | Version fingerprint for caching | "33a64df551425fcc55e" |
When you retry a failed request, you should not retry immediately. If the server is overloaded, 10,000 clients retrying simultaneously will make it worse. The standard retry strategy is exponential backoff with jitter:
python import time import random def retry_with_backoff(func, max_retries=5): for attempt in range(max_retries): try: return func() except Exception as e: if attempt == max_retries - 1: raise # Last attempt — give up # Base delay: 1s, 2s, 4s, 8s, 16s base_delay = 2 ** attempt # Add jitter: random between 0 and base_delay jitter = random.uniform(0, base_delay) delay = base_delay + jitter # Cap at 30 seconds delay = min(delay, 30) print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f}s...") time.sleep(delay)
Why jitter? Without it, all clients that failed at time T=0 retry at T=1, fail again, retry at T=2, and so on — they move in lockstep, creating periodic traffic spikes (the "thundering herd" problem). Random jitter spreads the retries across time, smoothing the load.
HTTP gives you the transport. But how should you design the API on top of it? The dominant style for web APIs is REST (Representational State Transfer) — a set of constraints that, when followed, produce APIs that are predictable, cacheable, and evolvable.
REST says: think about resources (nouns), not actions (verbs). A resource is anything that can be named: a user, an order, a payment, a list of products. Each resource has a URI (Uniform Resource Identifier) — its address. HTTP methods provide the verbs.
rest # BAD: action-oriented (RPC-style) POST /createUser POST /getUser POST /deleteUser POST /listUsers # GOOD: resource-oriented (REST-style) POST /users # create a user GET /users/42 # get user 42 DELETE /users/42 # delete user 42 GET /users # list all users
The REST version is uniform: once you know the pattern for one resource, you know it for all of them. A developer who has never seen your API can guess that GET /products/17 returns product 17 and DELETE /orders/99 cancels order 99.
REST maps the four basic database operations (Create, Read, Update, Delete) to HTTP methods:
| Operation | HTTP Method | URI Pattern | Example | Response |
|---|---|---|---|---|
| Create | POST | /resources | POST /orders | 201 Created + Location header |
| Read one | GET | /resources/:id | GET /orders/42 | 200 OK + JSON body |
| Read many | GET | /resources | GET /orders?status=pending | 200 OK + JSON array |
| Replace | PUT | /resources/:id | PUT /orders/42 | 200 OK |
| Partial update | PATCH | /resources/:id | PATCH /orders/42 | 200 OK |
| Delete | DELETE | /resources/:id | DELETE /orders/42 | 204 No Content |
When one resource belongs to another, nest the URI:
rest GET /users/42/orders # all orders for user 42 GET /users/42/orders/7 # order 7 for user 42 POST /users/42/orders # create an order for user 42
But don't nest too deep. /users/42/orders/7/items/3/reviews is hard to read and hard to cache. Two levels is the sweet spot.
Any endpoint that returns a list needs pagination. Without it, GET /products returns your entire catalog in one response — 500MB of JSON. There are two main approaches:
Offset-based: ?offset=20&limit=10. Simple. But if items are inserted while you page, you'll skip or duplicate items. Also, ?offset=1000000 forces the database to scan and skip a million rows.
Cursor-based: ?cursor=eyJpZCI6NDJ9&limit=10. The cursor is an opaque token (often a base64-encoded ID). The server uses it to resume from exactly where it left off. No skipping, no duplication, efficient at any depth. Stripe, Slack, and Twitter all use cursor pagination.
python from flask import Flask, request, jsonify from uuid import uuid4 app = Flask(__name__) orders = {} # in-memory store (use a real DB in production) # CREATE — POST /orders @app.route('/orders', methods=['POST']) def create_order(): data = request.json order_id = str(uuid4()) order = { 'id': order_id, 'product': data['product'], 'quantity': data['quantity'], 'status': 'pending' } orders[order_id] = order return jsonify(order), 201, {'Location': f'/orders/{order_id}'} # READ ONE — GET /orders/:id @app.route('/orders/<order_id>', methods=['GET']) def get_order(order_id): order = orders.get(order_id) if not order: return jsonify({'error': 'not found'}), 404 return jsonify(order) # READ MANY — GET /orders?status=pending&limit=10 @app.route('/orders', methods=['GET']) def list_orders(): status = request.args.get('status') limit = int(request.args.get('limit', 20)) result = list(orders.values()) if status: result = [o for o in result if o['status'] == status] return jsonify({'data': result[:limit], 'total': len(result)}) # UPDATE — PUT /orders/:id @app.route('/orders/<order_id>', methods=['PUT']) def update_order(order_id): if order_id not in orders: return jsonify({'error': 'not found'}), 404 orders[order_id].update(request.json) return jsonify(orders[order_id]) # DELETE — DELETE /orders/:id @app.route('/orders/<order_id>', methods=['DELETE']) def delete_order(order_id): orders.pop(order_id, None) return '', 204
Real APIs need more than just CRUD. Clients need to filter (show only pending orders), sort (newest first), and select fields (give me only name and email, not the full 50-field profile).
rest # Filtering — use query parameters GET /orders?status=pending&min_total=100 # Sorting — prefix with - for descending GET /orders?sort=-created_at # newest first GET /orders?sort=total,-created_at # cheapest first, then newest # Sparse fieldsets — request only the fields you need GET /users/42?fields=name,email # don't send avatar, preferences, etc.
This reduces payload size (less bandwidth) and database load (fewer columns to fetch). The fields parameter is especially valuable for mobile clients on slow networks.
| Pattern | Example | Why |
|---|---|---|
| Plural nouns | /users, /orders | Consistent. /user/42 vs /users/42 — pick one (plural wins) |
| Kebab-case | /order-items (not /orderItems) | URIs are case-insensitive by convention |
| Filter via query params | /orders?status=shipped | Keeps the base URL clean |
| Return created resource | POST returns full object + id | Saves a follow-up GET |
| Consistent error format | {"error": "not_found", "message": "..."} | Clients can parse errors programmatically |
Here is the single most important concept in service communication reliability. More important than load balancing. More important than circuit breaking. It's the concept that prevents customers from being charged twice, orders from being created twice, and emails from being sent twice.
Idempotency means: performing an operation multiple times has the same effect as performing it once.
Recall from Chapter 0: the Orders service sends a payment request, the Payments service charges the card, but the response gets lost due to a network timeout. The Orders service doesn't know if the payment succeeded or failed. It must retry — the customer is waiting.
If the payment endpoint is not idempotent, the retry creates a second charge. The customer pays $49.99 twice. Your support team gets an angry email. Your company gets a chargeback.
If the payment endpoint is idempotent, the retry detects "I already processed this payment" and returns the original result without charging again. The customer pays once. Everyone is happy.
| Method | Idempotent? | Why |
|---|---|---|
| GET | Yes | Reading data doesn't change it. GET /orders/42 returns the same order every time. |
| PUT | Yes | "Set X to Y." Setting a name to "Alice" 10 times still results in "Alice." |
| DELETE | Yes | "Remove X." Removing something that's already gone is a no-op. (Return 204 or 404.) |
| POST | No! | "Create X." Calling POST /orders twice creates TWO orders. This is the danger zone. |
| PATCH | Depends | "Set name to Alice" is idempotent. "Increment balance by $10" is NOT. |
Since POST is inherently non-idempotent, we need to make it idempotent. The standard technique: idempotency keys.
The client generates a unique key for each logical operation and sends it as a header:
http POST /v1/payments HTTP/1.1 Idempotency-Key: pay_ord42_attempt1 Content-Type: application/json {"amount": 4999, "currency": "usd"}
The server's logic:
Here is the implementation:
python import redis import json r = redis.Redis() def process_payment(request): idem_key = request.headers.get('Idempotency-Key') if not idem_key: return {'error': 'Idempotency-Key header required'}, 400 # Check if we've seen this key before cached = r.get(f'idem:{idem_key}') if cached: # Already processed — return the cached response return json.loads(cached) # First time seeing this key — process the payment result = charge_credit_card( amount=request.json['amount'], currency=request.json['currency'] ) # Cache the result with a TTL (24 hours) r.setex( f'idem:{idem_key}', 86400, # 24 hours in seconds json.dumps(result) ) return result
SET NX (set-if-not-exists) to acquire a lock on the idempotency key before processing. The second request sees the lock and waits (or returns 409 Conflict).The simulation below shows two scenarios side by side. On the left: a payment endpoint WITHOUT idempotency — retries cause double charges. On the right: the same endpoint WITH an idempotency key — retries are safe.
Click "Send Payment" then "Retry" to see the difference. The left side has no idempotency protection.
Stripe's payment API is the gold standard for idempotency. Here's how they do it:
| Behavior | Implementation |
|---|---|
| Key generation | Client generates a UUID. Stripe recommends including the resource context: ord_42_pay_abc123 |
| Key storage | Stored in database with the request params hash + response. Keys expire after 24 hours. |
| Param mismatch | If you reuse a key with DIFFERENT params, Stripe returns 400: "idempotency key used with different request parameters" |
| In-flight detection | If a request with the same key is still processing, Stripe returns 409 Conflict |
| Response replay | On duplicate key, returns the EXACT same HTTP status + body as the original request |
For simple cases, you don't even need Redis. The database itself can enforce idempotency using unique constraints and upserts:
sql -- Create table with unique constraint on the idempotency key CREATE TABLE payments ( id UUID PRIMARY KEY, idem_key VARCHAR(255) UNIQUE NOT NULL, amount INTEGER NOT NULL, status VARCHAR(20) NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); -- Insert-or-ignore: if idem_key already exists, do nothing INSERT INTO payments (id, idem_key, amount, status) VALUES ('uuid-here', 'pay_ord42_1', 4999, 'pending') ON CONFLICT (idem_key) DO NOTHING;
The database's unique constraint guarantees that two concurrent inserts with the same idem_key will not both succeed — one will hit the conflict and be silently dropped. No Redis, no distributed locks, no race conditions. The database IS the lock.
PATCH /accounts/42 that accepts {"action": "add_funds", "amount": 100}. A client sends this request, gets a timeout, and retries with the same body. What happens?Your API is live. Clients depend on it. Now you need to change it. Add a field. Remove an endpoint. Change a response format. How do you evolve your API without breaking every client that already integrated with it?
We saw these concepts in the context of data encoding (Chapter 0). They apply identically to APIs:
Backward compatible: new servers can handle requests from old clients. An old mobile app (v2.1) calls your new server (v3.0) and everything works. This is the minimum bar — you MUST maintain this.
Forward compatible: old servers can handle requests from new clients. Less common because you control your servers. But relevant during rolling deployments when old and new server versions coexist.
| Change | Why It's Safe |
|---|---|
| Add a new optional field to request | Old clients don't send it; server uses default |
| Add a new field to response | Old clients ignore unknown fields (if they're well-written) |
| Add a new endpoint | Old clients don't call it |
| Add a new enum value to response | Safe IF clients have a default/fallthrough case |
| Change | Why It Breaks |
|---|---|
| Remove a field from response | Old clients that read this field crash or show blank |
| Rename a field | Same as remove + add — old clients see the old name disappear |
| Change a field's type (string to int) | Old clients' parsers fail |
| Add a new required field to request | Old clients don't send it; their requests now fail with 400 |
| Remove an endpoint | Old clients get 404 |
When you must make a breaking change, you version the API. There are three main approaches:
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/orders, /v2/orders | Obvious, easy to route, easy to test | Duplicates routes; clients must update URLs |
| Header | Accept: application/vnd.myapi.v2+json | Clean URLs; version is metadata | Harder to test (can't paste in browser). Easy to forget. |
| Query param | /orders?version=2 | Easy to test; doesn't change URL structure | Looks messy. Caching is harder (different versions share URL path). |
How do you document your API so clients know what to send and what to expect? OpenAPI (formerly Swagger) is a machine-readable specification format — a YAML or JSON file that describes every endpoint, parameter, request body, and response schema.
yaml openapi: "3.0.0" paths: /orders: post: summary: Create a new order requestBody: content: application/json: schema: type: object required: [product, quantity] properties: product: type: string quantity: type: integer minimum: 1 responses: '201': description: Order created
From this spec, you can auto-generate client libraries (in Python, Go, TypeScript, etc.), documentation websites, and even mock servers for testing. The spec becomes the single source of truth for the API contract between teams.
Apply changes to the API schema and see which ones break old clients. Green = safe, red = breaking.
{"name": "Alice", "email": "alice@ex.com"}. In v2, you want to split "name" into "first_name" and "last_name". What is the backward-compatible way to do this?REST over HTTP with JSON is the default. But it's not the only option, and for high-performance microservice communication, it's often not the best option. Let's understand three alternatives and when to use each.
gRPC (Google Remote Procedure Call) is a framework for service-to-service communication that uses Protocol Buffers (Protobuf) for serialization and HTTP/2 for transport. It was created by Google, who use it for virtually all internal communication between their millions of services.
Why is gRPC faster than REST+JSON?
| Aspect | REST + JSON | gRPC + Protobuf |
|---|---|---|
| Serialization | JSON: text-based, ~600 bytes for a user object | Protobuf: binary, ~120 bytes for the same object (5x smaller) |
| Schema | Optional (OpenAPI). Client must guess or read docs. | Required (.proto file). Code-generated clients with type safety. |
| Transport | Usually HTTP/1.1 (one request per connection) | Always HTTP/2 (multiplexed streams) |
| Streaming | Not native (workarounds: SSE, chunked transfer) | Built-in: unary, server-streaming, client-streaming, bidirectional |
| Browser support | Native (fetch, XMLHttpRequest) | Requires grpc-web proxy (no native browser support) |
A Protobuf schema defines the service contract:
protobuf syntax = "proto3"; service PaymentService { rpc ChargeCard(ChargeRequest) returns (ChargeResponse); rpc StreamTransactions(AccountId) returns (stream Transaction); } message ChargeRequest { string customer_id = 1; int64 amount_cents = 2; string currency = 3; string idempotency_key = 4; } message ChargeResponse { string charge_id = 1; string status = 2; // "succeeded" | "failed" }
From this .proto file, the protoc compiler generates client and server stubs in any language — Python, Go, Java, C++. The client calls payment_stub.ChargeCard(request) as if it were a local function. The framework handles serialization, HTTP/2 framing, and error codes.
GraphQL (Facebook, 2015) solves a different problem: over-fetching and under-fetching. With REST, the server decides what fields to return. GET /users/42 returns everything — name, email, address, preferences, avatar URL — even if the client only needs the name. That's over-fetching. Conversely, to get a user's orders, you need a second request to GET /users/42/orders. That's under-fetching.
GraphQL lets the client specify exactly what it wants:
graphql # Client sends this query { user(id: 42) { name orders { id total status } } } # Server returns exactly this (nothing more, nothing less) { "data": { "user": { "name": "Alice", "orders": [ {"id": 7, "total": 49.99, "status": "shipped"}, {"id": 12, "total": 29.99, "status": "delivered"} ] } } }
One request. No over-fetching. No under-fetching. Especially powerful for mobile clients where bandwidth is limited and round trips are expensive.
One of gRPC's most powerful features is streaming. HTTP/2's multiplexed streams make this natural — no hacks, no polling:
| Mode | Direction | Use Case |
|---|---|---|
| Unary | Client sends 1 request, server sends 1 response | Standard request-response (same as REST) |
| Server streaming | Client sends 1 request, server sends N responses | Real-time price feeds, log tailing, search results |
| Client streaming | Client sends N requests, server sends 1 response | File uploads, sensor data collection, batch processing |
| Bidirectional streaming | Both sides send N messages concurrently | Chat, multiplayer games, collaborative editing |
protobuf service StockService { // Unary: one price check rpc GetPrice(Symbol) returns (Price); // Server streaming: subscribe to live price updates rpc StreamPrices(WatchList) returns (stream Price); // Client streaming: upload historical tick data rpc UploadTicks(stream Tick) returns (UploadResult); // Bidirectional: trading algorithm sends orders, receives fills rpc Trade(stream Order) returns (stream Fill); }
WebSockets solve yet another problem: real-time bidirectional communication. HTTP is request-response: the client asks, the server answers. But what if the server needs to push updates to the client — live chat messages, stock price updates, multiplayer game state?
WebSockets start as an HTTP request (the "upgrade" handshake), then switch to a persistent full-duplex TCP connection. Both sides can send messages at any time, with no request-response overhead.
python # WebSocket server with Python's websockets library import asyncio import websockets import json async def chat_handler(websocket): async for message in websocket: data = json.loads(message) # Broadcast to all connected clients response = json.dumps({ "user": data["user"], "text": data["text"], "timestamp": time.time() }) await websocket.send(response) # Start server — clients connect via ws://localhost:8765 asyncio.run(websockets.serve(chat_handler, "localhost", 8765))
| Protocol | Best For | Avoid When |
|---|---|---|
| REST + JSON | Public APIs, CRUD operations, browser-to-server, simple integrations | High-throughput internal services, real-time streams |
| gRPC + Protobuf | Internal microservice communication, high throughput, polyglot teams | Public APIs (browser support is poor), simple webhooks |
| GraphQL | Mobile apps, dashboards with many entities, reducing round trips | Simple CRUD, real-time updates, file uploads |
| WebSockets | Chat, live dashboards, multiplayer games, notifications | Standard request-response, stateless operations |
The simulation below shows the same operation — "get user 42 and their recent orders" — performed via REST, gRPC, and GraphQL. Watch the data volume, number of round trips, and serialization format differ across protocols.
Click each protocol to see the request flow, data format, and bytes on the wire.
You have 40 microservices. External clients — mobile apps, web browsers, third-party integrations — need to talk to them. Do you expose all 40 services directly to the internet? Absolutely not. You put a single entry point in front of them: the API Gateway.
An API gateway is a reverse proxy that sits between external clients and your internal services. Every external request goes through it. Think of it as the receptionist at a large office building — all visitors check in at the front desk, which directs them to the right department.
The gateway handles cross-cutting concerns — responsibilities that every service needs but shouldn't implement individually:
| Concern | What It Does | Without Gateway |
|---|---|---|
| Authentication | Validates JWT tokens, API keys, OAuth tokens. Rejects unauthorized requests before they reach any service. | Every service implements its own auth. Inconsistencies and security holes. |
| Rate limiting | Limits requests per client (e.g., 100 req/sec). Protects services from abuse and DDoS. | Each service rate-limits independently. Attacker can overload one service by hammering it directly. |
| Routing | Routes /users/* to User Service, /orders/* to Order Service. One external URL, many internal services. | Clients must know the address of every service. Internal topology leaks to the outside. |
| Request transformation | Translates external API format to internal format. Adds headers, rewrites paths, aggregates responses. | Internal API changes force external clients to change. |
| Circuit breaking | If a downstream service is failing, stop sending it traffic. Return a cached response or error instead. | Cascading failures: one slow service brings down everything upstream. |
| Logging & metrics | Central place to log every request, measure latency, count errors, trace requests across services. | Logs scattered across 40 services. Impossible to correlate. |
The circuit breaker pattern is critical for preventing cascading failures. It works like an electrical circuit breaker in your house — when too much current flows (too many errors), the breaker trips and stops the flow.
| Gateway | Type | Key Features |
|---|---|---|
| Kong | Open source | Plugin ecosystem, Lua-based, runs on Nginx |
| AWS API Gateway | Managed (cloud) | Tight AWS integration, Lambda triggers, usage plans |
| Envoy | Open source (L7 proxy) | Service mesh sidecar, gRPC-native, observability |
| Nginx | Open source | Battle-tested reverse proxy, simple config, fast |
| Traefik | Open source | Auto-discovery (Kubernetes, Docker), Let's Encrypt |
What does a real API gateway configuration look like? Here's a simplified Nginx config that routes, rate-limits, and adds security headers:
nginx # Rate limiting: 10 requests/second per client IP limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; upstream users_service { server users-1:8080; server users-2:8080; # Load balance across instances } upstream orders_service { server orders-1:8080; server orders-2:8080; } server { listen 443 ssl; server_name api.myapp.com; # TLS termination at the gateway ssl_certificate /etc/ssl/api.crt; ssl_certificate_key /etc/ssl/api.key; # Security headers add_header X-Content-Type-Options nosniff; add_header X-Frame-Options DENY; # Route /users/* to Users service location /v1/users { limit_req zone=api burst=20; proxy_pass http://users_service; proxy_set_header X-Request-ID $request_id; } # Route /orders/* to Orders service location /v1/orders { limit_req zone=api burst=20; proxy_pass http://orders_service; proxy_set_header X-Request-ID $request_id; proxy_read_timeout 10s; # Fail fast if service hangs } }
Notice several important patterns: TLS terminates at the gateway (internal traffic is unencrypted for speed). Rate limiting uses a shared zone across all gateway instances. Each request gets a unique X-Request-ID for tracing. And upstream blocks define the actual service instances — Nginx round-robins between them.
The simulation shows external requests entering through the API gateway. Watch authentication, rate limiting, routing, and circuit breaking in action. Try sending many requests quickly to trigger the rate limiter, or "break" a service to see the circuit breaker trip.
Send requests and watch them flow through the gateway's pipeline. Break a service to trigger the circuit breaker.
Service communication is one of the most frequently tested topics in system design interviews. Here is your cheat sheet, followed by coding drills and design patterns you should be able to whiteboard in under 5 minutes.
| Topic | One-Liner | Key Numbers |
|---|---|---|
| DNS resolution | Hierarchical name-to-IP lookup with caching at every level | TTL: 60-300s typical. 13 root server clusters worldwide. |
| HTTP methods | GET (read), POST (create), PUT (replace), PATCH (update), DELETE (remove) | GET/PUT/DELETE are idempotent. POST is not. |
| Status codes | 2xx success, 3xx redirect, 4xx client error, 5xx server error | Retry on 5xx and 429. Never retry 4xx (except 429). |
| REST design | Resources (nouns) + HTTP methods (verbs) + JSON | Plural nouns, 2 nesting levels max, cursor pagination. |
| Idempotency | Same request N times = same effect as once. Use idempotency keys for POST. | Store key in Redis with 24h TTL. Use SET NX for locking. |
| API versioning | URL path (/v1/, /v2/) is the industry standard | Always backward compatible. Add optional fields, never remove. |
| gRPC | Protobuf (binary, 5x smaller) + HTTP/2 (multiplexed). Great for internal services. | ~10x faster serialization than JSON. No browser support. |
| GraphQL | Client specifies exact fields needed. One endpoint, no over-fetching. | N+1 query problem. Caching is hard (no URL-based cache keys). |
| Circuit breaker | CLOSED → OPEN (on errors) → HALF-OPEN (test one) → CLOSED | Threshold: 50% errors over sliding window. Timeout: 10-60s. |
| API Gateway | Single entry point: auth, rate limiting, routing, circuit breaking | Kong, Envoy, AWS API GW, Nginx, Traefik. |
This is one of the most common interview questions. Here's the framework:
This is a classic coding question. Here's the sliding window counter approach using Redis:
python import time import redis r = redis.Redis() def is_rate_limited(client_id, max_requests=100, window_sec=60): """Sliding window rate limiter. Returns True if the client has exceeded max_requests in the last window_sec seconds. """ now = time.time() key = f"rate:{client_id}" pipe = r.pipeline() # Remove entries older than the window pipe.zremrangebyscore(key, 0, now - window_sec) # Count remaining entries pipe.zcard(key) # Add current request pipe.zadd(key, {str(now): now}) # Set expiry on the key itself pipe.expire(key, window_sec) results = pipe.execute() request_count = results[1] # zcard result return request_count >= max_requests
How it works: each request is stored as a member in a Redis sorted set, with the timestamp as both the member and the score. Before counting, we remove all entries older than the window. If the remaining count exceeds the limit, the client is rate-limited. The sorted set auto-expires if the client stops sending requests.
python import time from enum import Enum class State(Enum): CLOSED = "closed" OPEN = "open" HALF_OPEN = "half_open" class CircuitBreaker: def __init__(self, failure_threshold=5, reset_timeout=30): self.state = State.CLOSED self.failure_count = 0 self.failure_threshold = failure_threshold self.reset_timeout = reset_timeout self.last_failure_time = 0 def call(self, func, *args): if self.state == State.OPEN: if time.time() - self.last_failure_time > self.reset_timeout: self.state = State.HALF_OPEN # Try one request else: raise Exception("Circuit is OPEN — failing fast") try: result = func(*args) self._on_success() return result except Exception as e: self._on_failure() raise def _on_success(self): self.failure_count = 0 self.state = State.CLOSED def _on_failure(self): self.failure_count += 1 self.last_failure_time = time.time() if self.failure_count >= self.failure_threshold: self.state = State.OPEN # Usage cb = CircuitBreaker(failure_threshold=5, reset_timeout=30) try: result = cb.call(requests.post, "http://payments/charge", json=data) except Exception: # Either the call failed or the circuit is open return {"error": "payment service unavailable"}, 503
In production, you don't just retry N times — you retry within a time budget. If the overall operation must complete in 5 seconds, your retry loop must respect that deadline:
python import time import random import requests def call_with_budget(url, data, budget_sec=5.0, idem_key=None): """Call a service with a retry budget. Retries with backoff until the budget is exhausted. Uses idempotency key for safe retries.""" deadline = time.time() + budget_sec attempt = 0 headers = {'Content-Type': 'application/json'} if idem_key: headers['Idempotency-Key'] = idem_key while time.time() < deadline: remaining = deadline - time.time() if remaining <= 0: break try: # Per-request timeout = remaining budget resp = requests.post( url, json=data, headers=headers, timeout=min(remaining, 2.0) # 2s max per attempt ) if resp.status_code < 500: return resp # Success or client error — don't retry except requests.Timeout: pass # Will retry except requests.ConnectionError: pass # Will retry # Exponential backoff with jitter delay = min(2 ** attempt + random.random(), remaining) time.sleep(delay) attempt += 1 raise TimeoutError(f"Budget of {budget_sec}s exhausted after {attempt} attempts")
Key insight: each individual request has a timeout (min(remaining, 2.0)), AND the overall retry loop has a budget. The budget prevents a cascade where a slow service causes its callers to accumulate waiting threads, which causes their callers to accumulate, and so on until the entire system grinds to a halt.
Watch the API design process step by step. Each step reveals resources, endpoints, and edge cases.
Service communication is the nervous system of distributed systems. Every other topic in this series depends on it. Here's how this lesson connects to everything else.
| Chapter | Core Concept | Why It Matters |
|---|---|---|
| 0 | The communication explosion | Microservices trade simple function calls for complex network communication |
| 1 | DNS & service discovery | Services must find each other by name, not hardcoded IP |
| 2 | HTTP request lifecycle | The universal protocol: methods, status codes, headers, keep-alive |
| 3 | REST API design | Resources + HTTP verbs = predictable, cacheable, evolvable APIs |
| 4 | Idempotency | The network will fail. Retries must be safe. Idempotency keys are mandatory. |
| 5 | API evolution | Add optional fields, never remove. Version with /v1/. OpenAPI for contracts. |
| 6 | gRPC, GraphQL, WebSockets | Different tools for different problems: performance, flexibility, real-time |
| 7 | API gateway & circuit breaking | Single entry point for auth, rate limiting, routing. Circuit breakers prevent cascades. |
| 8 | Interview patterns | Rate limiters, circuit breakers, REST design drills |
| Topic | Connection |
|---|---|
| Encoding & Evolution | Chapter 5 (API evolution) is the API-level version of schema evolution. Encoding formats (Protobuf, Avro) determine how data survives version changes on the wire. |
| Replication | When you replicate data across nodes, the replication protocol IS a form of service communication — with all the same problems (ordering, idempotency, failure detection). |
| Consistency & Consensus | Distributed consensus protocols (Raft, Paxos) are just very careful, very specific forms of service communication where the "API contract" is mathematically proven. |
| Load Balancing | The API gateway's routing is one form of load balancing. Understanding L4 vs L7 load balancing, consistent hashing, and health checks deepens everything in Chapter 7. |
| Message Queues | When synchronous HTTP communication isn't enough (fire-and-forget, event-driven architectures), you move to asynchronous messaging. Same idempotency problems, different transport. |
| Limitation | What We Didn't Cover | When It Matters |
|---|---|---|
| Synchronous only | Asynchronous messaging (Kafka, RabbitMQ, SQS) | Event-driven architectures, decoupled services, eventual consistency |
| Request-response | Event sourcing, CQRS | High-write systems, audit logs, temporal queries |
| Point-to-point | Service mesh (Istio, Linkerd) | Managing communication policies (mTLS, retries, observability) across 100+ services |
| Single-region | Cross-region communication, geo-routing | Global services with users in multiple continents |
| Trusted network | mTLS (mutual TLS), zero-trust networking | Security-sensitive environments where internal traffic must also be encrypted |
Service communication is layer 1 of the distributed systems stack. Everything above it — replication, consensus, transactions, consistency models — is built on top of the primitives we learned here. A replicated database is just services sending carefully ordered messages to each other. A distributed lock is just an API call with strong idempotency guarantees. A message queue is just an HTTP POST with persistence.
When you understand DNS, HTTP, REST, idempotency, and circuit breaking at the level of this lesson, you have the vocabulary to reason about any distributed system. The rest is just applying these primitives in more sophisticated combinations.
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport