Service Architecture — From Absolute Zero to Mastery

Chapter 0: The Problem

You have a successful application. It started as a single codebase — user accounts, order processing, payment handling, notification sending, recommendation engine — all in one repository, deployed as one binary. The team was 5 engineers. Life was simple. Ship the whole thing, run it on three servers, call it a day.

Now you are 50 engineers. The codebase is 500,000 lines. A change to the recommendation algorithm requires re-testing the entire payment system. A junior engineer breaks the login page by changing a shared utility function. Deployment takes 45 minutes because the entire application must be built, tested, and deployed even for a one-line fix. The payments team wants to use Python for ML-based fraud detection, but the monolith is in Java. Everyone queues behind one release train.

The problem is not technical capacity. The problem is organizational scaling. When multiple teams work on the same codebase, they step on each other. When everything deploys together, one team's bug blocks another team's release. When one component needs different resources (the recommendation engine needs GPUs, the login page needs low latency), you cannot scale them independently.

Monolith vs. Microservices Deploy

Watch a deploy flow through a monolith (everything at once) vs. microservices (independent deploys). Notice how a failure in one component affects the rest.

Click a mode to begin

When the monolith deploys, a failure in the payments module rolls back the entire deploy — including the perfectly working recommendation update. When microservices deploy independently, the payments failure only affects payments. The recommendation service ships on its own schedule.

This lesson is about how to structure a system as a collection of services. We will cover when to break a monolith, how to define service boundaries, how services communicate, and the infrastructure patterns (API gateways, service meshes, sidecars) that make it all work. By the end, you will understand the architecture of systems like Netflix, Uber, and Google — and when that architecture is worth the complexity.

A team of 5 engineers works on a 50,000-line monolith. They deploy 3 times a day with no issues. Should they switch to microservices?

Yes — microservices are always better for scalability. Probably not. They have no organizational scaling problem — 5 engineers can coordinate easily. They deploy frequently with no bottleneck. Microservices add operational complexity (networking, deployment, monitoring, debugging) that is not justified when the monolith works well. Microservices solve organizational problems, not technical ones. Only if they expect to grow beyond 500,000 lines of code.

Chapter 1: The Monolith

A monolith is a single deployable unit containing all application functionality. One codebase, one build, one binary (or one WAR file, one container image). Function calls between components are local — in-process, nanosecond latency. No network involved.

Why Monoliths Are Good

Monoliths get a bad reputation, but they have genuine advantages that microservices cannot replicate:

Advantage	Why it matters
Simple deployment	One artifact to build, test, and deploy. No coordination between services.
Simple debugging	One process, one log stream, one debugger. Stack traces show the full call chain.
No network latency	Function calls are nanoseconds. No serialization, no HTTP overhead, no retries.
Transactional consistency	A single database transaction can span all components. ACID guarantees for free.
Easy refactoring	Rename a function, and your IDE updates all callers. Try that across 20 services.

Why Monoliths Break Down

The problems emerge with scale — specifically, team scale. As the monolith grows:

Coupling creep. Over time, modules reach into each other's internals. The recommendations engine reads from the payments database directly. The user module calls an internal function in the notifications module. These implicit dependencies mean you cannot change one module without risking another.

Build and deploy bottleneck. A 500,000-line Java monolith takes 30 minutes to compile and run its full test suite. Every change, no matter how small, waits in this queue. If 10 teams merge changes, one failing test blocks everyone.

Resource contention. The recommendation engine is CPU-intensive (ML inference). The file upload service is I/O-intensive (disk writes). They share the same server. You cannot give the recommendation engine GPUs without also running your file upload handler on expensive GPU instances.

Monolith Module Dependencies

Watch how modules in a monolith develop implicit dependencies over time. Click "Add Coupling" to introduce cross-module dependencies.

4 clean modules. Click "Add Coupling" to see dependencies grow.

The Modular Monolith

Before reaching for microservices, consider the modular monolith: a single deployable unit with strict module boundaries enforced by the build system. Modules communicate through defined interfaces, not by reaching into each other's internals. This gives you the simplicity of a monolith with some of the isolation benefits of microservices.

Shopify runs a modular monolith in Ruby. Their monolith has strict component boundaries enforced by tooling. Teams own components, not services. Deploys are fast because only changed components are tested. This works because their organizational structure (teams own components) maps cleanly to code structure (components within a monolith).

Two teams work in the same monolith. Team A's failing test blocks Team B's deploy for 4 hours. What is the ROOT cause?

Team A writes bad tests. The monolith needs better CI/CD tooling. Coupled deployment. Because A and B deploy as one unit, A's failure blocks B. The fix is either independent deployability (microservices) or component-level test isolation (modular monolith). The root cause is architectural, not process.

Chapter 2: Microservices

A microservices architecture decomposes the application into small, independently deployable services. Each service runs in its own process, owns its own data, and communicates with others over the network (HTTP, gRPC, or messaging). Each can be written in a different language, scaled independently, and deployed on its own schedule.

What Makes a Service "Micro"

The word "micro" is misleading. It does not mean small in lines of code. A microservice is defined by three properties:

Independently deployable. You can deploy service A without deploying services B, C, D. No coordination needed. If you must deploy two services together for a change to work, they are not truly independent — they are a distributed monolith.

Owns its data. Each service has its own database (or schema, or table prefix). No other service reads from or writes to it directly. This is the hardest rule to follow and the most important. Shared databases create tight coupling — if service B reads service A's table directly, any schema change in A can break B.

Single responsibility. Each service does one thing well. The user service manages user accounts. The order service manages orders. The notification service sends emails and push notifications. When you describe what a service does, you should be able to do it in one sentence without the word "and."

// Monolith function call
user = UserModule.get_user(42) # In-process, ~100ns
orders = OrderModule.get_orders(42) # In-process, ~100ns

// Microservice HTTP call
user = http.get("http://user-service/users/42") # Network, ~5ms
orders = http.get("http://order-service/orders?user=42") # Network, ~5ms

// Cost: 50,000x latency increase per call
// Benefit: user-service and order-service deploy independently
// Benefit: user-service can use PostgreSQL, order-service can use DynamoDB

The Costs of Microservices

Cost	What it means in practice
Network latency	Every service call adds 1-10ms. A page that needs 5 services: 5-50ms just in network overhead.
Partial failures	Any service can fail independently. Your code must handle timeouts, retries, circuit breakers.
Distributed transactions	No ACID across services. Use sagas (compensating transactions) or accept eventual consistency.
Operational complexity	50 services = 50 deploy pipelines, 50 log streams, 50 monitoring dashboards.
Testing	Integration tests require running multiple services. Contract testing becomes essential.
Debugging	A bug might span 5 services. Distributed tracing (Jaeger, Zipkin) is mandatory.

Monolith vs. Microservices Request Flow

Watch a user request flow through both architectures. In the monolith, everything is in-process. In microservices, each call crosses the network.

Pick an architecture

Microservices are a solution to an organizational problem, not a technical one. If your team is small and your monolith is manageable, microservices add complexity without benefit. The right question is not "should we use microservices?" but "do we have organizational problems that microservices solve?" If 3 teams are blocking each other's deploys, yes. If one engineer manages the whole thing, no.

Service A reads directly from Service B's database to avoid the network latency of an API call. What have they broken?

Nothing — this is a valid optimization for low-latency reads. Data ownership. If A reads B's database directly, any schema change in B can break A. B cannot evolve its database without coordinating with A. This is the defining sin of a "distributed monolith" — services that are deployed independently but coupled through shared data. The whole point of microservices is that B's internals are hidden behind its API. Only the network security model — add a firewall rule and it is fine.

Chapter 3: Service Boundaries

The hardest question in microservices is not "how" — it is "where to split." Get the boundaries right, and services evolve independently. Get them wrong, and every feature requires coordinated changes across 5 services. This is worse than a monolith.

Domain-Driven Design (DDD)

The most reliable approach to finding service boundaries comes from Domain-Driven Design (DDD), specifically the concept of bounded contexts. A bounded context is a boundary within which a particular model (set of terms, rules, and data) is consistent and meaningful.

In an e-commerce system, the word "product" means different things in different contexts:

Context	What "product" means	Key data
Catalog	Something to browse and search	Name, description, images, category
Inventory	Something with a quantity in a warehouse	SKU, quantity, warehouse location
Pricing	Something with a price that changes	Base price, discounts, currency, tax rules
Shipping	Something with weight and dimensions	Weight, dimensions, fragile flag, origin

Each bounded context becomes a service. The catalog service owns the product's name and description. The pricing service owns the product's price. They communicate through well-defined interfaces. If the pricing team changes how discounts work, the catalog service does not need to change — it never knew about discount logic in the first place.

Signals That Your Boundaries Are Wrong

Chatty communication. If service A makes 10 calls to service B for every user request, they are probably one service split into two. High inter-service communication suggests the boundary cuts through a cohesive domain.

Coordinated deploys. If changing a feature requires deploying services A, B, and C together, the boundary is wrong. True microservices can be deployed independently. Coordinated deploys mean the services are coupled — they are a distributed monolith in disguise.

Shared data models. If two services need the exact same data in the exact same shape, they might be one service. Or one should own the data and expose it via API to the other.

Bounded Context Map

An e-commerce system decomposed into bounded contexts. Click contexts to see their data models — notice how "product" means something different in each.

Click a context to see its data model

How Big Should a Service Be?

Forget lines of code. The right size is determined by team ownership. A service should be small enough that one team can own it entirely — design it, build it, deploy it, and operate it. Amazon's "two-pizza team" rule: if two pizzas cannot feed the team, the team is too big. If a team owns one service, it should be sized so that team can understand and maintain it.

In practice: most services are 5,000-50,000 lines of code, owned by a team of 3-8 engineers. Some are smaller (a simple proxy, a config service). Some are larger (the core business logic). The number is not the point — team ownership is.

You split the user profile into two services: "user-basic" (name, email) and "user-preferences" (theme, language, notifications). Every page load requires calling both. Was this a good split?

No. These are almost always accessed together, so splitting them doubles the network calls with no organizational benefit. A good service boundary should minimize cross-boundary communication. "User profile" is one bounded context — splitting it by data type instead of domain creates unnecessary coupling and latency. Yes — smaller services are always better. Only if different teams own each service.

Chapter 4: API Gateway

You have decomposed your application into 20 microservices. A mobile app needs to load a user's dashboard — which requires data from the user service, order service, recommendation service, and notification service. Should the mobile app make 4 HTTP requests to 4 different services?

No. That exposes internal service topology to the client, multiplies round trips (especially painful on mobile networks with 100ms+ latency), and means every client must know the address and API of every service. Add a new service, and every client must be updated.

An API gateway is a single entry point for all client requests. The client talks to one endpoint. The gateway routes, aggregates, and transforms requests to the appropriate backend services.

Mobile App

GET /dashboard → API Gateway

↓

API Gateway

Fans out to: user-svc, order-svc, rec-svc, notif-svc

↓

Aggregate

Combines 4 responses into one JSON payload

↓

Response

One HTTP response with all dashboard data

What the API Gateway Does

Function	How it helps
Request routing	Route `/users` to user-svc, `/orders` to order-svc
Aggregation	Combine multiple service responses into one
Authentication	Verify JWT tokens once at the gateway, not in every service
Rate limiting	Protect backend services from traffic spikes
TLS termination	Handle HTTPS at the edge; internal traffic is plain HTTP
Protocol translation	Accept REST from mobile, translate to gRPC for internal services
Caching	Cache common responses to reduce backend load

BFF (Backend for Frontend)

Different clients need different data. A mobile app wants a compact JSON payload. The web app wants a richer response. An internal admin tool needs different fields entirely. A single API gateway becomes a bottleneck when it must serve all these needs.

The BFF pattern (Backend for Frontend) creates a separate gateway for each client type. The mobile BFF tailors responses for mobile. The web BFF serves the web app. Each BFF is owned by the frontend team that uses it, so they can evolve independently.

// Without BFF: one gateway serves all clients
Mobile → API Gateway → backend services
Web → API Gateway → backend services
Admin → API Gateway → backend services
// Gateway becomes a bottleneck and pleases nobody

// With BFF: one gateway per client
Mobile → Mobile BFF → backend services # Owned by mobile team
Web → Web BFF → backend services # Owned by web team
Admin → Admin BFF → backend services # Owned by admin team
// Each BFF tailored to its client's needs

API Gateway Request Flow

A client request arrives at the API gateway. Watch it fan out to multiple backend services, aggregate responses, and return one response.

Click Send to watch the flow

The gateway can become a single point of failure. If the gateway goes down, nothing works. Mitigate with: multiple gateway instances behind a load balancer, health checks, circuit breakers, and graceful degradation (return cached responses if a backend is down). Real-world gateways: Kong, Envoy (as edge proxy), AWS API Gateway, Netflix Zuul.

Your mobile app makes 4 API calls on launch (user, orders, recommendations, notifications). Each takes 80ms over the mobile network. Total launch time?

80ms if calls are parallel. 320ms sequentially, 80ms parallel. But on mobile, parallel connections are limited and each TCP handshake adds overhead. An API gateway reduces this to ONE call: the app sends GET /dashboard, the gateway fans out to 4 services in parallel over the fast internal network (~5ms each), aggregates, and returns one response in ~85ms total. 20ms — the gateway caches everything.

Chapter 5: Service Mesh

Every microservice needs the same boring infrastructure: retries on failure, timeouts, circuit breaking, mutual TLS, load balancing between instances, distributed tracing, metric collection. You could implement all of this in every service's code. But then every team must build the same logic, in every language, and keep it consistent.

A service mesh extracts this common infrastructure into a dedicated layer. Instead of each service implementing retries and TLS, a proxy runs alongside each service instance. All network traffic flows through this proxy. The proxy handles retries, timeouts, TLS, observability, and traffic management. The service code only worries about business logic.

How It Works

For every service instance, a sidecar proxy (typically Envoy) is deployed in the same pod/VM. The service makes a plain HTTP request to localhost:port. The sidecar intercepts it, applies policies (retry, timeout, mTLS), routes it to the destination service's sidecar, which decrypts and delivers it. Neither service knows about the mesh — it is transparent.

// Without service mesh: every service implements infra
user-svc code: http.get("order-svc", retries=3, timeout=5s, mtls=True)
order-svc code: http.get("payment-svc", retries=3, timeout=5s, mtls=True)
// Every service, every language, every team: same boilerplate

// With service mesh: services just make plain HTTP calls
user-svc code: http.get("order-svc") # Plain HTTP to localhost proxy
// Sidecar proxy handles: retries, timeout, mTLS, tracing, metrics
// Configured centrally, applied uniformly, language-agnostic

Service Mesh Traffic Flow

Watch how requests flow through sidecar proxies. Each service's traffic is intercepted and managed by its proxy. Toggle features to see what the mesh handles.

Click Send to see mesh routing

Real Service Meshes

Mesh	Sidecar proxy	Control plane	Notable feature
Istio	Envoy	istiod	Most feature-rich; complex to operate
Linkerd	linkerd2-proxy (Rust)	Linkerd control plane	Lightweight; simpler than Istio
Consul Connect	Envoy or built-in	Consul server	Integrates with HashiCorp ecosystem
AWS App Mesh	Envoy	AWS managed	Managed control plane on AWS

The service mesh trade-off: observability and uniformity vs. latency and complexity. Every request now goes through two extra hops (source sidecar → destination sidecar), adding 1-3ms of latency. The sidecar consumes CPU and memory on every node. And the mesh itself (control plane + proxies) is another system to operate and debug. The payoff: uniform retries, mTLS, and observability without any code changes. For organizations with 50+ services in multiple languages, the payoff is enormous.

You have 30 microservices in 4 languages (Java, Python, Go, Node). Each needs mutual TLS, retries, and distributed tracing. Without a service mesh, how many implementations do you need?

At least 4 — one per language. Each language needs its own library for mTLS, retries, and tracing. Then you must keep all 4 libraries consistent (same retry policy, same timeout values, same tracing format). When you change the retry policy, you update 4 libraries and redeploy 30 services. A service mesh eliminates this: one proxy implementation (Envoy), one configuration, zero code changes in the services. 1 — use a shared library that works in all languages. 30 — one per service, regardless of language.

Chapter 6: Control Plane vs. Data Plane

This is one of the most fundamental patterns in all of distributed systems — not just service meshes, but networking, databases, and infrastructure. Once you see it, you see it everywhere.

The data plane handles the actual traffic. It forwards packets, routes requests, serves data. It must be fast, because it is in the hot path of every request.

The control plane configures the data plane. It decides routing rules, sets policies, manages configuration. It does not handle user traffic. It can be slow (updated infrequently), but it must be correct.

The analogy: a highway system. The data plane is the roads — cars (requests) drive on them at full speed. The control plane is the traffic management center — it sets speed limits, configures traffic lights, and opens/closes lanes. The management center does not carry any cars. But it controls how every car flows.

System	Data plane	Control plane
Service mesh (Istio)	Envoy sidecar proxies (route traffic)	istiod (pushes routing rules to proxies)
Kubernetes	kubelet + container runtime (runs pods)	API server + scheduler + controllers (decides what runs where)
DNS	Recursive resolvers (answer queries)	Authoritative nameservers (define records)
Load balancer	LB forwarding engine (routes packets)	Health checker + config API (updates backend list)
SDN (software-defined networking)	Network switches (forward frames)	SDN controller (programs switch rules)
CDN	Edge servers (serve cached content)	Origin config (defines caching rules, purge)

Control Plane vs. Data Plane

Watch the control plane push configuration to data plane nodes. Then see user traffic flow through the data plane. The control plane never touches user traffic.

Push config first, then send traffic

Why the Split Matters

Failure isolation. If the control plane crashes, the data plane keeps running with its last-known configuration. Traffic still flows. New rules cannot be pushed, but existing traffic is not interrupted. This is critical: the data plane handles every user request, so it must be resilient. The control plane handles infrequent configuration changes, so brief outages are tolerable.

Scale independence. The data plane scales with traffic (more proxies, more instances). The control plane scales with the number of services (not with traffic volume). A service mesh might have 1,000 Envoy proxies handling 1 million requests per second, but only one control plane instance pushing config every 30 seconds.

The control plane is the source of truth. The desired state lives in the control plane. The data plane's job is to converge toward that desired state. Kubernetes embodies this: you tell the control plane "I want 5 replicas of service X." The control plane continuously reconciles reality toward that desire. The data plane (kubelet) does the actual work of running containers.

Your Istio control plane (istiod) crashes. What happens to existing service-to-service traffic?

Traffic continues normally. Envoy sidecars (data plane) already have their routing rules cached. They keep forwarding traffic using the last-known configuration. New routing changes cannot be pushed until the control plane recovers, but existing traffic is unaffected. This is the key benefit of the control/data plane split: data plane resilience survives control plane failures. All traffic stops — Envoy cannot route without istiod. Traffic degrades slowly as Envoy caches expire.

Chapter 7: The Sidecar Pattern

The sidecar pattern extends beyond service meshes. It is a general design pattern: attach a helper process alongside your main application process. The sidecar shares the same lifecycle (starts and stops together), the same network namespace (can communicate via localhost), and often the same filesystem.

What Sidecars Do

Think of a sidecar as a "plugin" that adds capabilities to any application without changing its code. Common sidecar uses:

Proxy sidecar (Envoy). Handles all network traffic: mTLS, retries, load balancing, tracing. The application makes plain HTTP calls; the sidecar handles everything else.

Log collector sidecar (Fluentd, Filebeat). Reads the application's log files and ships them to a centralized logging system (Elasticsearch, Splunk). The application writes to stdout; the sidecar collects and forwards.

Config sync sidecar (Consul Agent, Vault Agent). Watches a central config store and syncs configuration files to a shared volume. The application reads from the local filesystem; the sidecar keeps it updated.

// Kubernetes pod with sidecar containers
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-app # Main application
    image: my-app:v2
    ports:
    - containerPort: 8080
  - name: envoy-proxy # Sidecar: network proxy
    image: envoyproxy/envoy:v1.28
  - name: log-collector # Sidecar: log shipping
    image: fluent/fluentd:v1.16
    volumeMounts:
    - name: logs
      mountPath: /var/log/app

// All 3 containers share the same pod:
// - Same network namespace (localhost communication)
// - Same lifecycle (start/stop together)
// - Can share volumes (log files)

Sidecar vs. Library

Why not just use a library? Import a retry library, a TLS library, a tracing library into your application code. This is simpler — no extra process, no inter-process communication overhead.

Property	Library	Sidecar
Language	Must match the app's language	Language-agnostic (separate process)
Updates	Requires rebuilding and redeploying the app	Update sidecar independently
Isolation	Bug in library can crash the app	Sidecar crash does not crash the app
Overhead	Lower (in-process)	Higher (IPC, extra memory/CPU)
Consistency	Varies per team (different versions)	Uniform (one sidecar version fleet-wide)

Sidecar Architecture

A pod with an application container and two sidecars. Watch traffic flow through the proxy sidecar and logs flow through the log collector.

Click to see sidecar flows

The sidecar pattern is Kubernetes-native. In Kubernetes, a pod is a group of containers that share a network and lifecycle. Sidecars are just additional containers in the pod. Istio injects the Envoy sidecar automatically via a mutating admission webhook — you deploy your service as normal, and Istio adds the proxy behind the scenes. Kubernetes 1.28+ added native sidecar support with proper startup/shutdown ordering.

Your company has 30 services in Java, Python, Go, and Rust. You want to add mTLS to all service-to-service communication. What approach requires the least code changes?

Add a TLS library to each service — 4 implementations for 4 languages. Deploy a sidecar proxy (Envoy) alongside every service. The proxy handles mTLS transparently — services continue making plain HTTP calls to localhost. Zero code changes in any service, zero language-specific work. One proxy configuration applies fleet-wide. Use a VPN between all services instead of mTLS.

Chapter 8: Interactive Service Architecture

Time to see the full picture. Below is an interactive simulation of a microservice architecture with an API gateway, service mesh, and multiple backends. Send requests from clients, watch them flow through the gateway, mesh proxies, and services. Inject failures and see how the mesh retries automatically.

This is the complete architecture. Client → API Gateway → Envoy sidecar → Service → Envoy sidecar → downstream service. Click through and watch how every piece we have covered fits together.

Full Microservice Request Flow

A client request flows through the API gateway, service mesh sidecars, and backend services. Inject failures to see retries and circuit breaking.

Click Send or Play to begin

What to Try

Experiment 1: Send a request. Watch it flow through the API gateway, hit the first service's sidecar proxy, reach the service, then fan out to downstream services through their sidecars.

Experiment 2: Inject a failure. One of the backend services becomes unhealthy. Watch the sidecar retry the request to a healthy instance. This is the mesh at work — the service code has no retry logic.

Experiment 3: Send many requests. Watch the gateway distribute across services, sidecars load-balance across instances, and metrics accumulate. This is the observability that the mesh provides for free.

Chapter 9: Connections

Service architecture is the structural foundation of distributed systems. Here is how it connects to everything else.

What We Covered

Concept	Key takeaway
Monolith	Simple, fast, works for small teams. Breaks down with organizational scale.
Microservices	Independent deploy + data ownership. Solves org problems, adds operational complexity.
Service boundaries	Bounded contexts from DDD. Minimize cross-boundary communication.
API Gateway	Single entry point. Routing, aggregation, auth, rate limiting. BFF for per-client gateways.
Service mesh	Sidecar proxies handle retries, mTLS, tracing. Language-agnostic, zero code changes.
Control/data plane	Data plane handles traffic (fast). Control plane manages config (correct). Split everywhere.
Sidecar pattern	Attach helper process alongside app. Proxy, log collector, config sync.

Where to Go Next

Topic	Connection
Load Balancing	API gateways and service meshes both perform load balancing; the algorithms (round-robin, least-conn) matter here too
Data Storage	Each microservice owns its database; caching reduces cross-service calls
Messaging	Event-driven communication between services replaces synchronous HTTP calls
Consensus	Service discovery (Consul, etcd) uses consensus to maintain the service registry

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport. Service architecture makes these failure modes explicit and gives you tools (meshes, gateways, circuit breakers) to handle them. The complexity is real, but the alternative — a monolith that outgrows its team — is worse.

A request from a mobile app reaches your backend through: Client → ? → ? → Service. What are the two missing components in a modern microservice architecture?

API Gateway → Sidecar Proxy (Envoy) → Service. The gateway handles external concerns (auth, rate limiting, routing). The sidecar proxy handles internal concerns (mTLS, retries, tracing). The service handles business logic. Three layers, each with a clear responsibility. Load Balancer → Database → Service. DNS → CDN → Service.