Distributed Systems

Service Architecture

Microservices, API gateways, service meshes, control planes, data planes, and the sidecar pattern.

Prerequisites: HTTP APIs + Basic networking. That's it.
10
Chapters
9
Simulations
0
Assumed Knowledge

Chapter 0: The Problem

You have a successful application. It started as a single codebase — user accounts, order processing, payment handling, notification sending, recommendation engine — all in one repository, deployed as one binary. The team was 5 engineers. Life was simple. Ship the whole thing, run it on three servers, call it a day.

Now you are 50 engineers. The codebase is 500,000 lines. A change to the recommendation algorithm requires re-testing the entire payment system. A junior engineer breaks the login page by changing a shared utility function. Deployment takes 45 minutes because the entire application must be built, tested, and deployed even for a one-line fix. The payments team wants to use Python for ML-based fraud detection, but the monolith is in Java. Everyone queues behind one release train.

The problem is not technical capacity. The problem is organizational scaling. When multiple teams work on the same codebase, they step on each other. When everything deploys together, one team's bug blocks another team's release. When one component needs different resources (the recommendation engine needs GPUs, the login page needs low latency), you cannot scale them independently.

Monolith vs. Microservices Deploy

Watch a deploy flow through a monolith (everything at once) vs. microservices (independent deploys). Notice how a failure in one component affects the rest.

Click a mode to begin

When the monolith deploys, a failure in the payments module rolls back the entire deploy — including the perfectly working recommendation update. When microservices deploy independently, the payments failure only affects payments. The recommendation service ships on its own schedule.

This lesson is about how to structure a system as a collection of services. We will cover when to break a monolith, how to define service boundaries, how services communicate, and the infrastructure patterns (API gateways, service meshes, sidecars) that make it all work. By the end, you will understand the architecture of systems like Netflix, Uber, and Google — and when that architecture is worth the complexity.
A team of 5 engineers works on a 50,000-line monolith. They deploy 3 times a day with no issues. Should they switch to microservices?

Chapter 1: The Monolith

A monolith is a single deployable unit containing all application functionality. One codebase, one build, one binary (or one WAR file, one container image). Function calls between components are local — in-process, nanosecond latency. No network involved.

Why Monoliths Are Good

Monoliths get a bad reputation, but they have genuine advantages that microservices cannot replicate:

AdvantageWhy it matters
Simple deploymentOne artifact to build, test, and deploy. No coordination between services.
Simple debuggingOne process, one log stream, one debugger. Stack traces show the full call chain.
No network latencyFunction calls are nanoseconds. No serialization, no HTTP overhead, no retries.
Transactional consistencyA single database transaction can span all components. ACID guarantees for free.
Easy refactoringRename a function, and your IDE updates all callers. Try that across 20 services.

Why Monoliths Break Down

The problems emerge with scale — specifically, team scale. As the monolith grows:

Coupling creep. Over time, modules reach into each other's internals. The recommendations engine reads from the payments database directly. The user module calls an internal function in the notifications module. These implicit dependencies mean you cannot change one module without risking another.
Build and deploy bottleneck. A 500,000-line Java monolith takes 30 minutes to compile and run its full test suite. Every change, no matter how small, waits in this queue. If 10 teams merge changes, one failing test blocks everyone.
Resource contention. The recommendation engine is CPU-intensive (ML inference). The file upload service is I/O-intensive (disk writes). They share the same server. You cannot give the recommendation engine GPUs without also running your file upload handler on expensive GPU instances.
Monolith Module Dependencies

Watch how modules in a monolith develop implicit dependencies over time. Click "Add Coupling" to introduce cross-module dependencies.

4 clean modules. Click "Add Coupling" to see dependencies grow.

The Modular Monolith

Before reaching for microservices, consider the modular monolith: a single deployable unit with strict module boundaries enforced by the build system. Modules communicate through defined interfaces, not by reaching into each other's internals. This gives you the simplicity of a monolith with some of the isolation benefits of microservices.

Shopify runs a modular monolith in Ruby. Their monolith has strict component boundaries enforced by tooling. Teams own components, not services. Deploys are fast because only changed components are tested. This works because their organizational structure (teams own components) maps cleanly to code structure (components within a monolith).

Two teams work in the same monolith. Team A's failing test blocks Team B's deploy for 4 hours. What is the ROOT cause?

Chapter 2: Microservices

A microservices architecture decomposes the application into small, independently deployable services. Each service runs in its own process, owns its own data, and communicates with others over the network (HTTP, gRPC, or messaging). Each can be written in a different language, scaled independently, and deployed on its own schedule.

What Makes a Service "Micro"

The word "micro" is misleading. It does not mean small in lines of code. A microservice is defined by three properties:

Independently deployable. You can deploy service A without deploying services B, C, D. No coordination needed. If you must deploy two services together for a change to work, they are not truly independent — they are a distributed monolith.
Owns its data. Each service has its own database (or schema, or table prefix). No other service reads from or writes to it directly. This is the hardest rule to follow and the most important. Shared databases create tight coupling — if service B reads service A's table directly, any schema change in A can break B.
Single responsibility. Each service does one thing well. The user service manages user accounts. The order service manages orders. The notification service sends emails and push notifications. When you describe what a service does, you should be able to do it in one sentence without the word "and."
// Monolith function call
user = UserModule.get_user(42) # In-process, ~100ns
orders = OrderModule.get_orders(42) # In-process, ~100ns

// Microservice HTTP call
user = http.get("http://user-service/users/42") # Network, ~5ms
orders = http.get("http://order-service/orders?user=42") # Network, ~5ms

// Cost: 50,000x latency increase per call
// Benefit: user-service and order-service deploy independently
// Benefit: user-service can use PostgreSQL, order-service can use DynamoDB

The Costs of Microservices

CostWhat it means in practice
Network latencyEvery service call adds 1-10ms. A page that needs 5 services: 5-50ms just in network overhead.
Partial failuresAny service can fail independently. Your code must handle timeouts, retries, circuit breakers.
Distributed transactionsNo ACID across services. Use sagas (compensating transactions) or accept eventual consistency.
Operational complexity50 services = 50 deploy pipelines, 50 log streams, 50 monitoring dashboards.
TestingIntegration tests require running multiple services. Contract testing becomes essential.
DebuggingA bug might span 5 services. Distributed tracing (Jaeger, Zipkin) is mandatory.
Monolith vs. Microservices Request Flow

Watch a user request flow through both architectures. In the monolith, everything is in-process. In microservices, each call crosses the network.

Pick an architecture
Microservices are a solution to an organizational problem, not a technical one. If your team is small and your monolith is manageable, microservices add complexity without benefit. The right question is not "should we use microservices?" but "do we have organizational problems that microservices solve?" If 3 teams are blocking each other's deploys, yes. If one engineer manages the whole thing, no.
Service A reads directly from Service B's database to avoid the network latency of an API call. What have they broken?

Chapter 3: Service Boundaries

The hardest question in microservices is not "how" — it is "where to split." Get the boundaries right, and services evolve independently. Get them wrong, and every feature requires coordinated changes across 5 services. This is worse than a monolith.

Domain-Driven Design (DDD)

The most reliable approach to finding service boundaries comes from Domain-Driven Design (DDD), specifically the concept of bounded contexts. A bounded context is a boundary within which a particular model (set of terms, rules, and data) is consistent and meaningful.

In an e-commerce system, the word "product" means different things in different contexts:

ContextWhat "product" meansKey data
CatalogSomething to browse and searchName, description, images, category
InventorySomething with a quantity in a warehouseSKU, quantity, warehouse location
PricingSomething with a price that changesBase price, discounts, currency, tax rules
ShippingSomething with weight and dimensionsWeight, dimensions, fragile flag, origin

Each bounded context becomes a service. The catalog service owns the product's name and description. The pricing service owns the product's price. They communicate through well-defined interfaces. If the pricing team changes how discounts work, the catalog service does not need to change — it never knew about discount logic in the first place.

Signals That Your Boundaries Are Wrong

Chatty communication. If service A makes 10 calls to service B for every user request, they are probably one service split into two. High inter-service communication suggests the boundary cuts through a cohesive domain.
Coordinated deploys. If changing a feature requires deploying services A, B, and C together, the boundary is wrong. True microservices can be deployed independently. Coordinated deploys mean the services are coupled — they are a distributed monolith in disguise.
Shared data models. If two services need the exact same data in the exact same shape, they might be one service. Or one should own the data and expose it via API to the other.
Bounded Context Map

An e-commerce system decomposed into bounded contexts. Click contexts to see their data models — notice how "product" means something different in each.

Click a context to see its data model

How Big Should a Service Be?

Forget lines of code. The right size is determined by team ownership. A service should be small enough that one team can own it entirely — design it, build it, deploy it, and operate it. Amazon's "two-pizza team" rule: if two pizzas cannot feed the team, the team is too big. If a team owns one service, it should be sized so that team can understand and maintain it.

In practice: most services are 5,000-50,000 lines of code, owned by a team of 3-8 engineers. Some are smaller (a simple proxy, a config service). Some are larger (the core business logic). The number is not the point — team ownership is.

You split the user profile into two services: "user-basic" (name, email) and "user-preferences" (theme, language, notifications). Every page load requires calling both. Was this a good split?

Chapter 4: API Gateway

You have decomposed your application into 20 microservices. A mobile app needs to load a user's dashboard — which requires data from the user service, order service, recommendation service, and notification service. Should the mobile app make 4 HTTP requests to 4 different services?

No. That exposes internal service topology to the client, multiplies round trips (especially painful on mobile networks with 100ms+ latency), and means every client must know the address and API of every service. Add a new service, and every client must be updated.

An API gateway is a single entry point for all client requests. The client talks to one endpoint. The gateway routes, aggregates, and transforms requests to the appropriate backend services.

Mobile App
GET /dashboard → API Gateway
API Gateway
Fans out to: user-svc, order-svc, rec-svc, notif-svc
Aggregate
Combines 4 responses into one JSON payload
Response
One HTTP response with all dashboard data

What the API Gateway Does

FunctionHow it helps
Request routingRoute /users to user-svc, /orders to order-svc
AggregationCombine multiple service responses into one
AuthenticationVerify JWT tokens once at the gateway, not in every service
Rate limitingProtect backend services from traffic spikes
TLS terminationHandle HTTPS at the edge; internal traffic is plain HTTP
Protocol translationAccept REST from mobile, translate to gRPC for internal services
CachingCache common responses to reduce backend load

BFF (Backend for Frontend)

Different clients need different data. A mobile app wants a compact JSON payload. The web app wants a richer response. An internal admin tool needs different fields entirely. A single API gateway becomes a bottleneck when it must serve all these needs.

The BFF pattern (Backend for Frontend) creates a separate gateway for each client type. The mobile BFF tailors responses for mobile. The web BFF serves the web app. Each BFF is owned by the frontend team that uses it, so they can evolve independently.

// Without BFF: one gateway serves all clients
Mobile → API Gateway → backend services
Web → API Gateway → backend services
Admin → API Gateway → backend services
// Gateway becomes a bottleneck and pleases nobody

// With BFF: one gateway per client
Mobile → Mobile BFF → backend services # Owned by mobile team
Web → Web BFF → backend services # Owned by web team
Admin → Admin BFF → backend services # Owned by admin team
// Each BFF tailored to its client's needs
API Gateway Request Flow

A client request arrives at the API gateway. Watch it fan out to multiple backend services, aggregate responses, and return one response.

Click Send to watch the flow
The gateway can become a single point of failure. If the gateway goes down, nothing works. Mitigate with: multiple gateway instances behind a load balancer, health checks, circuit breakers, and graceful degradation (return cached responses if a backend is down). Real-world gateways: Kong, Envoy (as edge proxy), AWS API Gateway, Netflix Zuul.
Your mobile app makes 4 API calls on launch (user, orders, recommendations, notifications). Each takes 80ms over the mobile network. Total launch time?

Chapter 5: Service Mesh

Every microservice needs the same boring infrastructure: retries on failure, timeouts, circuit breaking, mutual TLS, load balancing between instances, distributed tracing, metric collection. You could implement all of this in every service's code. But then every team must build the same logic, in every language, and keep it consistent.

A service mesh extracts this common infrastructure into a dedicated layer. Instead of each service implementing retries and TLS, a proxy runs alongside each service instance. All network traffic flows through this proxy. The proxy handles retries, timeouts, TLS, observability, and traffic management. The service code only worries about business logic.

How It Works

For every service instance, a sidecar proxy (typically Envoy) is deployed in the same pod/VM. The service makes a plain HTTP request to localhost:port. The sidecar intercepts it, applies policies (retry, timeout, mTLS), routes it to the destination service's sidecar, which decrypts and delivers it. Neither service knows about the mesh — it is transparent.

// Without service mesh: every service implements infra
user-svc code: http.get("order-svc", retries=3, timeout=5s, mtls=True)
order-svc code: http.get("payment-svc", retries=3, timeout=5s, mtls=True)
// Every service, every language, every team: same boilerplate

// With service mesh: services just make plain HTTP calls
user-svc code: http.get("order-svc") # Plain HTTP to localhost proxy
// Sidecar proxy handles: retries, timeout, mTLS, tracing, metrics
// Configured centrally, applied uniformly, language-agnostic
Service Mesh Traffic Flow

Watch how requests flow through sidecar proxies. Each service's traffic is intercepted and managed by its proxy. Toggle features to see what the mesh handles.

Click Send to see mesh routing

Real Service Meshes

MeshSidecar proxyControl planeNotable feature
IstioEnvoyistiodMost feature-rich; complex to operate
Linkerdlinkerd2-proxy (Rust)Linkerd control planeLightweight; simpler than Istio
Consul ConnectEnvoy or built-inConsul serverIntegrates with HashiCorp ecosystem
AWS App MeshEnvoyAWS managedManaged control plane on AWS
The service mesh trade-off: observability and uniformity vs. latency and complexity. Every request now goes through two extra hops (source sidecar → destination sidecar), adding 1-3ms of latency. The sidecar consumes CPU and memory on every node. And the mesh itself (control plane + proxies) is another system to operate and debug. The payoff: uniform retries, mTLS, and observability without any code changes. For organizations with 50+ services in multiple languages, the payoff is enormous.
You have 30 microservices in 4 languages (Java, Python, Go, Node). Each needs mutual TLS, retries, and distributed tracing. Without a service mesh, how many implementations do you need?

Chapter 6: Control Plane vs. Data Plane

This is one of the most fundamental patterns in all of distributed systems — not just service meshes, but networking, databases, and infrastructure. Once you see it, you see it everywhere.

The data plane handles the actual traffic. It forwards packets, routes requests, serves data. It must be fast, because it is in the hot path of every request.

The control plane configures the data plane. It decides routing rules, sets policies, manages configuration. It does not handle user traffic. It can be slow (updated infrequently), but it must be correct.

The analogy: a highway system. The data plane is the roads — cars (requests) drive on them at full speed. The control plane is the traffic management center — it sets speed limits, configures traffic lights, and opens/closes lanes. The management center does not carry any cars. But it controls how every car flows.
SystemData planeControl plane
Service mesh (Istio)Envoy sidecar proxies (route traffic)istiod (pushes routing rules to proxies)
Kuberneteskubelet + container runtime (runs pods)API server + scheduler + controllers (decides what runs where)
DNSRecursive resolvers (answer queries)Authoritative nameservers (define records)
Load balancerLB forwarding engine (routes packets)Health checker + config API (updates backend list)
SDN (software-defined networking)Network switches (forward frames)SDN controller (programs switch rules)
CDNEdge servers (serve cached content)Origin config (defines caching rules, purge)
Control Plane vs. Data Plane

Watch the control plane push configuration to data plane nodes. Then see user traffic flow through the data plane. The control plane never touches user traffic.

Push config first, then send traffic

Why the Split Matters

Failure isolation. If the control plane crashes, the data plane keeps running with its last-known configuration. Traffic still flows. New rules cannot be pushed, but existing traffic is not interrupted. This is critical: the data plane handles every user request, so it must be resilient. The control plane handles infrequent configuration changes, so brief outages are tolerable.

Scale independence. The data plane scales with traffic (more proxies, more instances). The control plane scales with the number of services (not with traffic volume). A service mesh might have 1,000 Envoy proxies handling 1 million requests per second, but only one control plane instance pushing config every 30 seconds.

The control plane is the source of truth. The desired state lives in the control plane. The data plane's job is to converge toward that desired state. Kubernetes embodies this: you tell the control plane "I want 5 replicas of service X." The control plane continuously reconciles reality toward that desire. The data plane (kubelet) does the actual work of running containers.
Your Istio control plane (istiod) crashes. What happens to existing service-to-service traffic?

Chapter 7: The Sidecar Pattern

The sidecar pattern extends beyond service meshes. It is a general design pattern: attach a helper process alongside your main application process. The sidecar shares the same lifecycle (starts and stops together), the same network namespace (can communicate via localhost), and often the same filesystem.

What Sidecars Do

Think of a sidecar as a "plugin" that adds capabilities to any application without changing its code. Common sidecar uses:

Proxy sidecar (Envoy). Handles all network traffic: mTLS, retries, load balancing, tracing. The application makes plain HTTP calls; the sidecar handles everything else.
Log collector sidecar (Fluentd, Filebeat). Reads the application's log files and ships them to a centralized logging system (Elasticsearch, Splunk). The application writes to stdout; the sidecar collects and forwards.
Config sync sidecar (Consul Agent, Vault Agent). Watches a central config store and syncs configuration files to a shared volume. The application reads from the local filesystem; the sidecar keeps it updated.
// Kubernetes pod with sidecar containers
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-app # Main application
    image: my-app:v2
    ports:
    - containerPort: 8080
  - name: envoy-proxy # Sidecar: network proxy
    image: envoyproxy/envoy:v1.28
  - name: log-collector # Sidecar: log shipping
    image: fluent/fluentd:v1.16
    volumeMounts:
    - name: logs
      mountPath: /var/log/app

// All 3 containers share the same pod:
// - Same network namespace (localhost communication)
// - Same lifecycle (start/stop together)
// - Can share volumes (log files)

Sidecar vs. Library

Why not just use a library? Import a retry library, a TLS library, a tracing library into your application code. This is simpler — no extra process, no inter-process communication overhead.

PropertyLibrarySidecar
LanguageMust match the app's languageLanguage-agnostic (separate process)
UpdatesRequires rebuilding and redeploying the appUpdate sidecar independently
IsolationBug in library can crash the appSidecar crash does not crash the app
OverheadLower (in-process)Higher (IPC, extra memory/CPU)
ConsistencyVaries per team (different versions)Uniform (one sidecar version fleet-wide)
Sidecar Architecture

A pod with an application container and two sidecars. Watch traffic flow through the proxy sidecar and logs flow through the log collector.

Click to see sidecar flows
The sidecar pattern is Kubernetes-native. In Kubernetes, a pod is a group of containers that share a network and lifecycle. Sidecars are just additional containers in the pod. Istio injects the Envoy sidecar automatically via a mutating admission webhook — you deploy your service as normal, and Istio adds the proxy behind the scenes. Kubernetes 1.28+ added native sidecar support with proper startup/shutdown ordering.
Your company has 30 services in Java, Python, Go, and Rust. You want to add mTLS to all service-to-service communication. What approach requires the least code changes?

Chapter 8: Interactive Service Architecture

Time to see the full picture. Below is an interactive simulation of a microservice architecture with an API gateway, service mesh, and multiple backends. Send requests from clients, watch them flow through the gateway, mesh proxies, and services. Inject failures and see how the mesh retries automatically.

This is the complete architecture. Client → API Gateway → Envoy sidecar → Service → Envoy sidecar → downstream service. Click through and watch how every piece we have covered fits together.
Full Microservice Request Flow

A client request flows through the API gateway, service mesh sidecars, and backend services. Inject failures to see retries and circuit breaking.

Click Send or Play to begin

What to Try

Experiment 1: Send a request. Watch it flow through the API gateway, hit the first service's sidecar proxy, reach the service, then fan out to downstream services through their sidecars.

Experiment 2: Inject a failure. One of the backend services becomes unhealthy. Watch the sidecar retry the request to a healthy instance. This is the mesh at work — the service code has no retry logic.

Experiment 3: Send many requests. Watch the gateway distribute across services, sidecars load-balance across instances, and metrics accumulate. This is the observability that the mesh provides for free.

Chapter 9: Connections

Service architecture is the structural foundation of distributed systems. Here is how it connects to everything else.

What We Covered

ConceptKey takeaway
MonolithSimple, fast, works for small teams. Breaks down with organizational scale.
MicroservicesIndependent deploy + data ownership. Solves org problems, adds operational complexity.
Service boundariesBounded contexts from DDD. Minimize cross-boundary communication.
API GatewaySingle entry point. Routing, aggregation, auth, rate limiting. BFF for per-client gateways.
Service meshSidecar proxies handle retries, mTLS, tracing. Language-agnostic, zero code changes.
Control/data planeData plane handles traffic (fast). Control plane manages config (correct). Split everywhere.
Sidecar patternAttach helper process alongside app. Proxy, log collector, config sync.

Where to Go Next

TopicConnection
Load BalancingAPI gateways and service meshes both perform load balancing; the algorithms (round-robin, least-conn) matter here too
Data StorageEach microservice owns its database; caching reduces cross-service calls
MessagingEvent-driven communication between services replaces synchronous HTTP calls
ConsensusService discovery (Consul, etcd) uses consensus to maintain the service registry
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport. Service architecture makes these failure modes explicit and gives you tools (meshes, gateways, circuit breakers) to handle them. The complexity is real, but the alternative — a monolith that outgrows its team — is worse.
A request from a mobile app reaches your backend through: Client → ? → ? → Service. What are the two missing components in a modern microservice architecture?