Microservices, API gateways, service meshes, control planes, data planes, and the sidecar pattern.
You have a successful application. It started as a single codebase — user accounts, order processing, payment handling, notification sending, recommendation engine — all in one repository, deployed as one binary. The team was 5 engineers. Life was simple. Ship the whole thing, run it on three servers, call it a day.
Now you are 50 engineers. The codebase is 500,000 lines. A change to the recommendation algorithm requires re-testing the entire payment system. A junior engineer breaks the login page by changing a shared utility function. Deployment takes 45 minutes because the entire application must be built, tested, and deployed even for a one-line fix. The payments team wants to use Python for ML-based fraud detection, but the monolith is in Java. Everyone queues behind one release train.
The problem is not technical capacity. The problem is organizational scaling. When multiple teams work on the same codebase, they step on each other. When everything deploys together, one team's bug blocks another team's release. When one component needs different resources (the recommendation engine needs GPUs, the login page needs low latency), you cannot scale them independently.
Watch a deploy flow through a monolith (everything at once) vs. microservices (independent deploys). Notice how a failure in one component affects the rest.
When the monolith deploys, a failure in the payments module rolls back the entire deploy — including the perfectly working recommendation update. When microservices deploy independently, the payments failure only affects payments. The recommendation service ships on its own schedule.
A monolith is a single deployable unit containing all application functionality. One codebase, one build, one binary (or one WAR file, one container image). Function calls between components are local — in-process, nanosecond latency. No network involved.
Monoliths get a bad reputation, but they have genuine advantages that microservices cannot replicate:
| Advantage | Why it matters |
|---|---|
| Simple deployment | One artifact to build, test, and deploy. No coordination between services. |
| Simple debugging | One process, one log stream, one debugger. Stack traces show the full call chain. |
| No network latency | Function calls are nanoseconds. No serialization, no HTTP overhead, no retries. |
| Transactional consistency | A single database transaction can span all components. ACID guarantees for free. |
| Easy refactoring | Rename a function, and your IDE updates all callers. Try that across 20 services. |
The problems emerge with scale — specifically, team scale. As the monolith grows:
Watch how modules in a monolith develop implicit dependencies over time. Click "Add Coupling" to introduce cross-module dependencies.
Before reaching for microservices, consider the modular monolith: a single deployable unit with strict module boundaries enforced by the build system. Modules communicate through defined interfaces, not by reaching into each other's internals. This gives you the simplicity of a monolith with some of the isolation benefits of microservices.
Shopify runs a modular monolith in Ruby. Their monolith has strict component boundaries enforced by tooling. Teams own components, not services. Deploys are fast because only changed components are tested. This works because their organizational structure (teams own components) maps cleanly to code structure (components within a monolith).
A microservices architecture decomposes the application into small, independently deployable services. Each service runs in its own process, owns its own data, and communicates with others over the network (HTTP, gRPC, or messaging). Each can be written in a different language, scaled independently, and deployed on its own schedule.
The word "micro" is misleading. It does not mean small in lines of code. A microservice is defined by three properties:
| Cost | What it means in practice |
|---|---|
| Network latency | Every service call adds 1-10ms. A page that needs 5 services: 5-50ms just in network overhead. |
| Partial failures | Any service can fail independently. Your code must handle timeouts, retries, circuit breakers. |
| Distributed transactions | No ACID across services. Use sagas (compensating transactions) or accept eventual consistency. |
| Operational complexity | 50 services = 50 deploy pipelines, 50 log streams, 50 monitoring dashboards. |
| Testing | Integration tests require running multiple services. Contract testing becomes essential. |
| Debugging | A bug might span 5 services. Distributed tracing (Jaeger, Zipkin) is mandatory. |
Watch a user request flow through both architectures. In the monolith, everything is in-process. In microservices, each call crosses the network.
The hardest question in microservices is not "how" — it is "where to split." Get the boundaries right, and services evolve independently. Get them wrong, and every feature requires coordinated changes across 5 services. This is worse than a monolith.
The most reliable approach to finding service boundaries comes from Domain-Driven Design (DDD), specifically the concept of bounded contexts. A bounded context is a boundary within which a particular model (set of terms, rules, and data) is consistent and meaningful.
In an e-commerce system, the word "product" means different things in different contexts:
| Context | What "product" means | Key data |
|---|---|---|
| Catalog | Something to browse and search | Name, description, images, category |
| Inventory | Something with a quantity in a warehouse | SKU, quantity, warehouse location |
| Pricing | Something with a price that changes | Base price, discounts, currency, tax rules |
| Shipping | Something with weight and dimensions | Weight, dimensions, fragile flag, origin |
Each bounded context becomes a service. The catalog service owns the product's name and description. The pricing service owns the product's price. They communicate through well-defined interfaces. If the pricing team changes how discounts work, the catalog service does not need to change — it never knew about discount logic in the first place.
An e-commerce system decomposed into bounded contexts. Click contexts to see their data models — notice how "product" means something different in each.
Forget lines of code. The right size is determined by team ownership. A service should be small enough that one team can own it entirely — design it, build it, deploy it, and operate it. Amazon's "two-pizza team" rule: if two pizzas cannot feed the team, the team is too big. If a team owns one service, it should be sized so that team can understand and maintain it.
In practice: most services are 5,000-50,000 lines of code, owned by a team of 3-8 engineers. Some are smaller (a simple proxy, a config service). Some are larger (the core business logic). The number is not the point — team ownership is.
You have decomposed your application into 20 microservices. A mobile app needs to load a user's dashboard — which requires data from the user service, order service, recommendation service, and notification service. Should the mobile app make 4 HTTP requests to 4 different services?
No. That exposes internal service topology to the client, multiplies round trips (especially painful on mobile networks with 100ms+ latency), and means every client must know the address and API of every service. Add a new service, and every client must be updated.
An API gateway is a single entry point for all client requests. The client talks to one endpoint. The gateway routes, aggregates, and transforms requests to the appropriate backend services.
| Function | How it helps |
|---|---|
| Request routing | Route /users to user-svc, /orders to order-svc |
| Aggregation | Combine multiple service responses into one |
| Authentication | Verify JWT tokens once at the gateway, not in every service |
| Rate limiting | Protect backend services from traffic spikes |
| TLS termination | Handle HTTPS at the edge; internal traffic is plain HTTP |
| Protocol translation | Accept REST from mobile, translate to gRPC for internal services |
| Caching | Cache common responses to reduce backend load |
Different clients need different data. A mobile app wants a compact JSON payload. The web app wants a richer response. An internal admin tool needs different fields entirely. A single API gateway becomes a bottleneck when it must serve all these needs.
The BFF pattern (Backend for Frontend) creates a separate gateway for each client type. The mobile BFF tailors responses for mobile. The web BFF serves the web app. Each BFF is owned by the frontend team that uses it, so they can evolve independently.
A client request arrives at the API gateway. Watch it fan out to multiple backend services, aggregate responses, and return one response.
Every microservice needs the same boring infrastructure: retries on failure, timeouts, circuit breaking, mutual TLS, load balancing between instances, distributed tracing, metric collection. You could implement all of this in every service's code. But then every team must build the same logic, in every language, and keep it consistent.
A service mesh extracts this common infrastructure into a dedicated layer. Instead of each service implementing retries and TLS, a proxy runs alongside each service instance. All network traffic flows through this proxy. The proxy handles retries, timeouts, TLS, observability, and traffic management. The service code only worries about business logic.
For every service instance, a sidecar proxy (typically Envoy) is deployed in the same pod/VM. The service makes a plain HTTP request to localhost:port. The sidecar intercepts it, applies policies (retry, timeout, mTLS), routes it to the destination service's sidecar, which decrypts and delivers it. Neither service knows about the mesh — it is transparent.
Watch how requests flow through sidecar proxies. Each service's traffic is intercepted and managed by its proxy. Toggle features to see what the mesh handles.
| Mesh | Sidecar proxy | Control plane | Notable feature |
|---|---|---|---|
| Istio | Envoy | istiod | Most feature-rich; complex to operate |
| Linkerd | linkerd2-proxy (Rust) | Linkerd control plane | Lightweight; simpler than Istio |
| Consul Connect | Envoy or built-in | Consul server | Integrates with HashiCorp ecosystem |
| AWS App Mesh | Envoy | AWS managed | Managed control plane on AWS |
This is one of the most fundamental patterns in all of distributed systems — not just service meshes, but networking, databases, and infrastructure. Once you see it, you see it everywhere.
The data plane handles the actual traffic. It forwards packets, routes requests, serves data. It must be fast, because it is in the hot path of every request.
The control plane configures the data plane. It decides routing rules, sets policies, manages configuration. It does not handle user traffic. It can be slow (updated infrequently), but it must be correct.
| System | Data plane | Control plane |
|---|---|---|
| Service mesh (Istio) | Envoy sidecar proxies (route traffic) | istiod (pushes routing rules to proxies) |
| Kubernetes | kubelet + container runtime (runs pods) | API server + scheduler + controllers (decides what runs where) |
| DNS | Recursive resolvers (answer queries) | Authoritative nameservers (define records) |
| Load balancer | LB forwarding engine (routes packets) | Health checker + config API (updates backend list) |
| SDN (software-defined networking) | Network switches (forward frames) | SDN controller (programs switch rules) |
| CDN | Edge servers (serve cached content) | Origin config (defines caching rules, purge) |
Watch the control plane push configuration to data plane nodes. Then see user traffic flow through the data plane. The control plane never touches user traffic.
Failure isolation. If the control plane crashes, the data plane keeps running with its last-known configuration. Traffic still flows. New rules cannot be pushed, but existing traffic is not interrupted. This is critical: the data plane handles every user request, so it must be resilient. The control plane handles infrequent configuration changes, so brief outages are tolerable.
Scale independence. The data plane scales with traffic (more proxies, more instances). The control plane scales with the number of services (not with traffic volume). A service mesh might have 1,000 Envoy proxies handling 1 million requests per second, but only one control plane instance pushing config every 30 seconds.
The sidecar pattern extends beyond service meshes. It is a general design pattern: attach a helper process alongside your main application process. The sidecar shares the same lifecycle (starts and stops together), the same network namespace (can communicate via localhost), and often the same filesystem.
Think of a sidecar as a "plugin" that adds capabilities to any application without changing its code. Common sidecar uses:
Why not just use a library? Import a retry library, a TLS library, a tracing library into your application code. This is simpler — no extra process, no inter-process communication overhead.
| Property | Library | Sidecar |
|---|---|---|
| Language | Must match the app's language | Language-agnostic (separate process) |
| Updates | Requires rebuilding and redeploying the app | Update sidecar independently |
| Isolation | Bug in library can crash the app | Sidecar crash does not crash the app |
| Overhead | Lower (in-process) | Higher (IPC, extra memory/CPU) |
| Consistency | Varies per team (different versions) | Uniform (one sidecar version fleet-wide) |
A pod with an application container and two sidecars. Watch traffic flow through the proxy sidecar and logs flow through the log collector.
Time to see the full picture. Below is an interactive simulation of a microservice architecture with an API gateway, service mesh, and multiple backends. Send requests from clients, watch them flow through the gateway, mesh proxies, and services. Inject failures and see how the mesh retries automatically.
A client request flows through the API gateway, service mesh sidecars, and backend services. Inject failures to see retries and circuit breaking.
Experiment 1: Send a request. Watch it flow through the API gateway, hit the first service's sidecar proxy, reach the service, then fan out to downstream services through their sidecars.
Experiment 2: Inject a failure. One of the backend services becomes unhealthy. Watch the sidecar retry the request to a healthy instance. This is the mesh at work — the service code has no retry logic.
Experiment 3: Send many requests. Watch the gateway distribute across services, sidecars load-balance across instances, and metrics accumulate. This is the observability that the mesh provides for free.
Service architecture is the structural foundation of distributed systems. Here is how it connects to everything else.
| Concept | Key takeaway |
|---|---|
| Monolith | Simple, fast, works for small teams. Breaks down with organizational scale. |
| Microservices | Independent deploy + data ownership. Solves org problems, adds operational complexity. |
| Service boundaries | Bounded contexts from DDD. Minimize cross-boundary communication. |
| API Gateway | Single entry point. Routing, aggregation, auth, rate limiting. BFF for per-client gateways. |
| Service mesh | Sidecar proxies handle retries, mTLS, tracing. Language-agnostic, zero code changes. |
| Control/data plane | Data plane handles traffic (fast). Control plane manages config (correct). Split everywhere. |
| Sidecar pattern | Attach helper process alongside app. Proxy, log collector, config sync. |
| Topic | Connection |
|---|---|
| Load Balancing | API gateways and service meshes both perform load balancing; the algorithms (round-robin, least-conn) matter here too |
| Data Storage | Each microservice owns its database; caching reduces cross-service calls |
| Messaging | Event-driven communication between services replaces synchronous HTTP calls |
| Consensus | Service discovery (Consul, etcd) uses consensus to maintain the service registry |