Designing Data-Intensive Applications — Chapter 5

Encoding & Evolution

JSON, Protobuf, Avro, schema evolution — how data survives change.

Prerequisites: JSON basics + Binary numbers. That's it.
11
Chapters
9+
Simulations
5
Interview Dimensions

Chapter 0: The Problem

Your service stores user profiles as JSON. Version 1 looks like this:

json
{
  "userId": 42,
  "name": "Alice",
  "email": "alice@example.com"
}

Today you ship version 2 with a new field:

json
{
  "userId": 42,
  "name": "Alice",
  "email": "alice@example.com",
  "phoneNumber": "+1-555-0123"
}

Seems harmless. But here is the nightmare scenario: you have 50 servers, and you deploy the new code via a rolling deployment — one server at a time, over 30 minutes. For those 30 minutes, some servers run v1 (old code, no phone_number) and some run v2 (new code, expects phone_number). All of them read and write to the same database.

Now consider the chaos:

WriterReaderWhat happens
v2 server writes phone_numberv1 server reads itv1 sees an unknown field. Does it crash? Ignore it? Silently drop it on the next write?
v1 server writes (no phone_number)v2 server reads itv2 expects phone_number. Does it crash? Use a default? Show blank in the UI?
v2 server reads, then writes backv1 server reads the resultIf v1 re-writes the record, does it strip phone_number? Data loss.

And it gets worse. This isn't just about databases. The same version mismatch happens with:

The core question. How do you change the shape of your data over time without breaking systems that are still using the old shape? This problem is called schema evolution, and solving it properly is one of the most underrated skills in distributed systems engineering. The encoding format you choose determines whether evolution is painful or painless.

Watch the Rolling Deployment

The simulation below shows a rolling deployment in progress. Six servers are being upgraded from v1 to v2 one at a time. All of them read and write to the same database. Watch how old and new code coexist, and notice the window of danger when both versions are active simultaneously.

Rolling Deployment: Old and New Code Coexist

Watch servers upgrade one by one. Observe read/write conflicts between versions.

6 servers running v1. Click "Start Deployment" to begin rolling upgrade.

During the upgrade window, every database write is a potential compatibility issue. A v2 server writes a record with phone_number. Moments later a v1 server reads it, ignores the unknown field, then writes it back — erasing phone_number forever. This silent data loss is one of the most common bugs in production distributed systems.

Two Kinds of Compatibility

To survive rolling deployments, your data format must support two properties:

Forward compatibility: old code can read data written by new code. The v1 server encounters phone_number. It doesn't know what phone_number is, but it doesn't crash — it ignores the unknown field and preserves it when writing back.

Backward compatibility: new code can read data written by old code. The v2 server reads a record without phone_number. It doesn't crash — it fills in a sensible default (null, or an empty string) and continues.

Some encoding formats make both of these easy. Some make them impossible. That is the subject of this lesson.

Concept check: During a rolling deployment, a v2 server writes a user record with a new "phone_number" field. A v1 server later reads that record, updates the user's email, and writes it back. What is the most dangerous outcome if the encoding format does not support forward compatibility?

Chapter 1: In-Memory vs On-Wire

Before we compare JSON vs Protobuf vs Avro, we need to understand why encoding exists at all. The answer comes down to two fundamentally different representations of data.

Two Worlds

In memory, data lives as objects, structs, hash maps, arrays, and trees. These structures are optimized for CPU access: pointers let you jump to any field in O(1), data can reference other data via shared pointers, and the layout is designed for the memory hierarchy (cache lines, page alignment). A Python dict, a Java HashMap, a Go struct — these are all in-memory representations.

On the wire (or on disk), data must become a flat sequence of bytes. You cannot send a pointer over the network — the receiving process has a completely different memory space. The byte sequence must be self-contained: it carries all the information needed to reconstruct the original data, with no external references.

The translation from in-memory to bytes is called encoding (also called serialization or marshalling). The reverse — bytes back to in-memory objects — is decoding (also called deserialization, parsing, or unmarshalling).

Why not just dump the raw memory? You might think: "just memcpy the struct and send it." This fails for many reasons: (1) pointer values are meaningless to the receiver, (2) different languages lay out objects differently, (3) different CPU architectures have different endianness (byte order), (4) the receiver might be running a different version of the code with a different struct definition. Encoding exists precisely to abstract away these machine-level details and produce a portable, version-independent representation.

Language-Specific Formats: A Cautionary Tale

Many programming languages offer built-in serialization: Java has java.io.Serializable, Python has pickle, Ruby has Marshal. These are convenient — you can serialize any in-memory object in one line of code. But they have severe problems:

ProblemWhy it matters
Language lock-inA Python pickle can only be read by Python. If your analytics team uses Java or your mobile app uses Kotlin, they cannot read the data.
SecurityDeserializing arbitrary bytes can execute code. Python's pickle.loads() will happily run os.system("rm -rf /") if the attacker crafted the bytes. This is not hypothetical — pickle deserialization attacks are a standard exploit class.
VersioningIf you add a field to a class and try to deserialize old data, you often get cryptic errors. There is no built-in mechanism for forward or backward compatibility.
PerformanceThese formats are often slower and produce larger outputs than purpose-built alternatives like Protobuf.
The rule. Never use language-specific serialization for anything that crosses a process boundary — not for APIs, not for databases, not for message queues. Not even for "temporary" caches that "only our Python service reads." Systems evolve. Languages change. Use a language-independent format.

The Encoding Pipeline

The simulation below shows an in-memory object being encoded into bytes, transmitted, and decoded back into an in-memory object. Notice the information that must be preserved (field names, types, values) and the information that is lost (memory addresses, method definitions, private internal state).

The Encoding Pipeline: Object → Bytes → Object

Click Encode to serialize the object, then Transmit to send, then Decode to reconstruct.

Step 1: In-memory object with fields, pointers, and methods.

What Gets Preserved, What Gets Lost

InformationIn MemoryOn Wire
Field names and valuesYesYes (must be preserved)
Data types (int, string, bool)Yes (implicit from language)Depends on format
Memory addresses (pointers)YesNo (meaningless to receiver)
Methods / class behaviorYesNo (receiver has its own code)
Schema / version infoImplicitDepends on format

The encoding format you choose determines which columns say "Yes" and which say "Depends." JSON preserves field names as strings but loses precise types (is 42 an int32 or an int64?). Protobuf preserves field tags (numbers, not names) and precise types, but requires a schema to decode. Avro preserves field names and types but requires the writer's schema at read time. Each choice has consequences for compatibility, performance, and usability.

Debug scenario: A team serializes ML model weights using Python pickle and stores them in S3. Six months later, the ML platform team migrates from Python 3.9 to 3.11, and the research team switches from a custom Tensor class to PyTorch tensors. Old pickled models no longer load. What is the root cause, and what format should they have used?

Chapter 2: JSON, XML, CSV

If language-specific formats are a trap, what should you use? The first answer most developers reach for is a human-readable text format: JSON, XML, or CSV. These formats are widely supported, easy to debug (you can open them in a text editor), and language-independent. But they each have subtle problems that bite you at scale.

JSON: The Lingua Franca

JSON (JavaScript Object Notation) is the dominant format for web APIs. Its strengths are real: every language has a JSON library, it is human-readable, and it supports nested objects and arrays. But JSON has three landmines that every systems engineer must know about.

Landmine 1: Number ambiguity. JSON does not distinguish between integers and floating-point numbers. The value 42 might be parsed as a 32-bit int, a 64-bit int, or a 64-bit float — it depends entirely on the parser implementation. This matters enormously for large integers.

The 253 problem. JavaScript's Number type is an IEEE 754 double-precision float with 53 bits of mantissa. Any integer larger than 253 (9,007,199,254,740,992) loses precision when parsed by a JavaScript client. If your database assigns 64-bit integer IDs, a JavaScript front-end will silently corrupt them. Twitter famously hit this bug: their tweet IDs exceeded 253, and JavaScript clients started returning wrong IDs. The fix? Send large integers as strings in JSON. Twitter's API returns both "id": 12345678901234567 and "id_str": "12345678901234567".

Landmine 2: No binary support. JSON can only represent text. If you need to embed binary data (images, encrypted blobs, compressed data), you must encode it as Base64 — which inflates the data by ~33%. A 1 MB image becomes ~1.37 MB in the JSON payload.

Landmine 3: No schema. JSON is schemaless by default. A field can be a string in one record and a number in another. There is no enforcement mechanism built into the format. JSON Schema exists but it is a documentation tool, not a wire-format enforcement tool — there is nothing in the JSON bytes that says "this field must be an integer."

XML: Powerful but Verbose

XML (Extensible Markup Language) was the dominant data format before JSON took over. It supports schemas (XSD), namespaces, attributes vs. elements, and XSLT transformations. But it is verbose — every value is wrapped in opening and closing tags — and the schema language (XSD) is notoriously complex. A simple user profile in XML is roughly 2-3x the size of the equivalent JSON.

CSV: Fragile at Best

CSV (Comma-Separated Values) is barely a format at all. There is no official standard (RFC 4180 is a best-effort attempt). There is no schema. There is no type information. If a value contains a comma, you quote it. If a value contains a quote, you escape it. If a value contains a newline... it depends on the parser. CSV is fine for exporting spreadsheets. It is not a serious option for inter-service communication.

Size Comparison

The simulation below encodes the same data in JSON, XML, and CSV. Watch the byte counts. Also notice where JSON silently corrupts a large integer.

The Same Data in Three Formats

Compare byte sizes. Watch the integer precision problem in JSON.

Showing user profile with normal integer ID (42). Click "Toggle Large Integer" to see the 2^53 problem.

Worked Example: Encoding a User Profile

json
// JSON: 67 bytes
{"userId":42,"name":"Alice","email":"alice@example.com"}
xml
<!-- XML: 138 bytes -->
<user>
  <userId>42</userId>
  <name>Alice</name>
  <email>alice@example.com</email>
</user>
csv
# CSV: 40 bytes (but no schema, no types, no nesting)
42,Alice,alice@example.com

CSV is smallest, but it cannot represent nested data, has no type information, and breaks if any field contains a comma. XML is largest because of the tag overhead. JSON is the sweet spot for human-readability, but still suffers from the type ambiguity and size overhead we discussed.

When JSON is the right choice. JSON is still the best choice for public-facing APIs where human readability matters, where clients are diverse (browsers, mobile apps, third-party integrations), and where bandwidth is not the bottleneck. The problems with JSON only matter at scale — millions of messages per second, precise type requirements, or tight bandwidth constraints. For everything else, JSON is fine.
Design question: You are building an API that returns financial transaction IDs. These IDs are 64-bit integers assigned by a Postgres BIGSERIAL column. Your frontend is a React app. A colleague says "just return them as JSON numbers." Why is this dangerous, and what should you do instead?

Chapter 3: Binary Encoding

JSON is human-readable but wastes bytes. Every field name is repeated in full text for every single record. The string "favoriteNumber" is 16 bytes — and you pay that cost in every message, every record, millions of times per day. Binary encoding formats solve this by replacing text with compact binary representations.

MessagePack: Binary JSON

MessagePack is the simplest binary format. It is essentially JSON with a binary wire format: field names are still stored as strings, but values are encoded more compactly (integers use variable-length encoding, strings carry explicit lengths instead of quote delimiters). A JSON object like {"userName":"Martin","favoriteNumber":1337} shrinks from ~45 bytes to ~33 bytes.

MessagePack is a quick win — smaller than JSON, faster to parse — but it still carries full field names. The real savings come from formats that eliminate field names entirely.

Protocol Buffers and Thrift: Field Tags Replace Names

Protocol Buffers (Protobuf, created at Google) and Apache Thrift (created at Facebook) take a different approach. Instead of encoding field names, they assign each field a numeric tag in a schema definition. The schema is shared between sender and receiver ahead of time. The wire format only carries the tag numbers, not the names.

protobuf
// Protocol Buffers schema (.proto file)
message Person {
  string user_name = 1;      // tag 1
  int64 favorite_number = 2;  // tag 2
}
thrift
// Thrift schema (.thrift file)
struct Person {
  1: string userName,
  2: i64 favoriteNumber,
}

When this message is encoded, the bytes contain (tag=1, type=string, value="Martin") and (tag=2, type=varint, value=1337). The tag is a small integer (1-2 bytes), not a 14-character string. This saves enormous amounts of space when you have millions of records.

Varint Encoding: How Protobuf Packs Integers

Protobuf does not use fixed-width integers. The number 1 should not cost 8 bytes just because the schema says int64. Instead, Protobuf uses varint encoding: each byte uses 7 bits for data and 1 bit (the MSB, most significant bit) as a continuation flag. If the MSB is 1, more bytes follow. If it is 0, this is the last byte.

// Encoding the integer 300 as a varint:
300 in binary: 100101100
Split into 7-bit groups (from right): 0101100 | 0000010
Add MSB continuation flags:
First byte: 1|0101100 = 0xAC (MSB=1: more bytes follow)
Second byte: 0|0000010 = 0x02 (MSB=0: last byte)
Wire bytes: [0xAC, 0x02] = 2 bytes

// Compare: a fixed 64-bit integer would be 8 bytes
// Varint for 1: just [0x01] = 1 byte
// Varint for 300: [0xAC, 0x02] = 2 bytes
// Varint for 1,000,000: [0xC0, 0x84, 0x3D] = 3 bytes

For signed integers, Protobuf uses ZigZag encoding to map signed values to unsigned varints: 0 → 0, -1 → 1, 1 → 2, -2 → 3, 2 → 4, and so on. This keeps small negative numbers small (without ZigZag, -1 as a 64-bit two's complement would encode as a 10-byte varint).

zigzag(n) = (n << 1) ^ (n >> 63) // for 64-bit signed
// -1 → (0xFFFFFFFFFFFFFFFE ^ 0xFFFFFFFFFFFFFFFF) = 1
// 1 → (0x0000000000000002 ^ 0x0000000000000000) = 2

Thrift: Two Wire Formats

Thrift offers two encoding formats with different trade-offs:

FormatField headerInteger encodingOur example
BinaryProtocoltype (1 byte) + tag (2 bytes) = 3 bytesFixed-width (i64 = 8 bytes always)~35 bytes
CompactProtocoltype+tag in 1-2 bytes (delta encoding)Varint + ZigZag~27 bytes

CompactProtocol packs the field type and tag delta into a single byte when the tag delta is small (which it usually is, since tags are sequential). This makes it almost as compact as Protobuf.

Interactive Binary Encoder

The simulation below encodes the same record in four formats. Watch how field names disappear in Protobuf and Thrift, replaced by tiny tag numbers. Compare the final byte counts.

Binary Encoding Comparison

Same data: {"userName": "Martin", "favoriteNumber": 1337}. Compare byte sizes across formats.

Click a format to highlight its bytes. Notice how binary formats eliminate field name overhead.

Worked Example: Byte-by-Byte Protobuf Encoding

Let us encode {"userName": "Martin", "favoriteNumber": 1337} in Protobuf, byte by byte:

// Field 1: user_name = "Martin" (tag=1, wire type=2 (length-delimited))
Byte 1: 0x0A = (1 << 3) | 2 = tag 1, wire type 2
Byte 2: 0x06 = length 6 (string is 6 bytes)
Bytes 3-8: 0x4D 0x61 0x72 0x74 0x69 0x6E = "Martin" in UTF-8

// Field 2: favorite_number = 1337 (tag=2, wire type=0 (varint))
Byte 9: 0x10 = (2 << 3) | 0 = tag 2, wire type 0
Bytes 10-11: 0xB9 0x0A = varint encoding of 1337

// Total: 11 bytes. Compare: JSON was ~45 bytes.

Eleven bytes. The JSON version was roughly 45 bytes. That is a 4x reduction for a single tiny record. At a million records per second, that is 34 MB/s of saved bandwidth. At a billion records in a database, that is 34 GB of saved storage.

python
# Varint encoder in Python — interview coding drill
def encode_varint(value):
    """Encode a non-negative integer as a Protobuf varint."""
    result = []
    while value > 0x7F:
        result.append((value & 0x7F) | 0x80)  # 7 data bits + MSB=1
        value >>= 7
    result.append(value & 0x7F)  # last byte: MSB=0
    return bytes(result)

def zigzag_encode(value):
    """ZigZag: map signed int to unsigned for efficient varint."""
    return (value << 1) ^ (value >> 63)

# Test
print(encode_varint(1).hex())       # "01"      (1 byte)
print(encode_varint(300).hex())     # "ac02"    (2 bytes)
print(encode_varint(1337).hex())    # "b90a"    (2 bytes)
print(zigzag_encode(-1))             # 1
print(zigzag_encode(1))              # 2
Coding drill: In Protobuf varint encoding, how many bytes does the integer 150 require? Work it out by hand.

Chapter 4: Schema Evolution

We have established that Protobuf and Thrift are smaller and faster than JSON. But the real reason to use them is schema evolution. Because these formats use numeric field tags instead of string names, and because they have explicit rules for handling unknown fields, they make it possible to evolve your data format safely over time.

The Five Rules of Protobuf/Thrift Evolution

These rules determine what changes are safe and what changes will break your system. Memorize them for interviews.

Rule 1: You can add new fields with new tag numbers. If you add a field with tag 3 to a schema that only had tags 1 and 2, old code will simply skip tag 3 when it encounters it in the data. The field is unknown, and the decoder skips it using the wire type to determine how many bytes to consume. New code, when reading old data, will see that tag 3 is missing and use the default value.

Rule 2: You cannot rename fields — but you don't need to. Field names only exist in the schema definition, not in the wire format. The name user_name is purely for programmer convenience. You can rename it to display_name in your .proto file without changing a single byte of encoded data, as long as the tag number stays the same.

Rule 3: You cannot change a field's tag number. The tag IS the field's identity. Changing tag 2 from favorite_number to tag 5 means every existing encoded record still says "tag 2 = 1337," but your new code looks for tag 5 and finds nothing. The data is lost.

Rule 4: Never add a required field after initial deployment. If you add a required field, old code will produce records without it. When new code reads those records, it will reject them because the required field is missing. This breaks backward compatibility. In Protobuf 3, all fields are optional by default — Google learned this lesson the hard way.

Rule 5: You can change types if the wire format is compatible. Changing int32 to int64 is safe — both use varint encoding, and the bits are compatible. But reading a 64-bit value with 32-bit code may truncate (the high bits are silently discarded). Changing int32 to string is not safe — the wire types are different and the decoder will misinterpret the bytes.

Forward vs backward in Protobuf. Forward compatibility (old code reads new data) works because unknown tags are skipped. Backward compatibility (new code reads old data) works because missing fields get defaults. Both properties hold only if you follow the five rules. Break any rule, and one or both properties fail.

Schema Evolution Visualizer

The simulation below shows two schema versions side by side. Try different evolution operations and see which ones are safe (green) and which ones break compatibility (red). This is exactly the kind of reasoning you must do in a system design interview when discussing API versioning.

Schema Evolution: Safe vs Breaking Changes

Try different schema changes. Green = safe, Red = breaks compatibility.

Original schema: 2 fields (tag 1: string, tag 2: int64). Try a change.

Evolution Rules Summary

ChangeForward compatible?Backward compatible?Safe?
Add optional field (new tag)Yes (old code skips unknown tag)Yes (new code uses default)Yes
Add required fieldYesNo (old data missing required field)No
Remove optional fieldYes (new code uses default)Yes (old code skips the missing tag)Yes (but never reuse the tag number)
Rename fieldYes (wire uses tags not names)YesYes
Change tag numberNo (old data uses old tag)No (new code looks for new tag)No
int32 → int64Yes (but may truncate)YesCareful
int32 → stringNo (wire type mismatch)NoNo
The "never reuse tag numbers" rule. When you remove a field, you must mark its tag number as reserved in the schema. If a future developer accidentally reuses that tag for a new field with a different type, old data in the database (which still uses that tag with the old type) will be misinterpreted. Protobuf has a reserved keyword for exactly this purpose: reserved 3, 4; reserved "old_field_name";
Debug scenario: You add a new required string phone_number = 3; field to your Protobuf schema and deploy the new server. Immediately, 30% of requests start failing with deserialization errors. Why, and what is the fix?

Chapter 5: Apache Avro

Protobuf and Thrift solve schema evolution with field tags. Apache Avro takes a radically different approach: no tags at all. Fields are matched by name, and the encoded data contains no field identifiers whatsoever — not tags, not names, nothing. Instead, the writer's schema and reader's schema are resolved at read time.

How Avro Encoding Works

An Avro schema looks like this:

json
{
  "type": "record",
  "name": "Person",
  "fields": [
    {"name": "userName", "type": "string"},
    {"name": "favoriteNumber", "type": ["null", "long"], "default": null}
  ]
}

When Avro encodes this record, it writes the fields in schema order with no field identifiers:

// Avro encoding of {userName: "Martin", favoriteNumber: 1337}
Bytes 1: 0x0C = varint length of string "Martin" (6, ZigZag-encoded as 12)
Bytes 2-7: 0x4D 0x61 0x72 0x74 0x69 0x6E = "Martin" in UTF-8
Byte 8: 0x02 = union index 1 (the "long" branch of ["null","long"])
Bytes 9-10: 0xF2 0x14 = ZigZag varint of 1337

// Total: 10 bytes. Even smaller than Protobuf's 11 bytes.
// Because: no field tags, no wire type indicators.

But wait — if there are no tags and no names, how does the reader know which bytes correspond to which field? The answer: the reader must have the writer's schema. Without it, the bytes are meaningless — just a stream of values with no field identity.

Schema Resolution: Writer vs Reader

This is Avro's key innovation. The writer and reader can have different schemas, and Avro resolves them at read time by matching fields by name:

Writer Schema (v1)
userName (string), favoriteNumber (long)
↓ bytes encoded in v1 field order
Reader Schema (v2)
userName (string), favoriteNumber (long), email (string, default="")
Resolution
userName: matched by name → read from bytes. favoriteNumber: matched by name → read from bytes. email: not in writer's schema → use default "".

And in reverse:

Writer Schema (v2)
userName, favoriteNumber, email
↓ bytes encoded in v2 field order
Reader Schema (v1)
userName, favoriteNumber
Resolution
userName: matched → read. favoriteNumber: matched → read. email: in writer but not reader → skip those bytes.
The catch. Every field that you might add or remove must have a default value in the schema. If a reader encounters a missing field with no default, Avro throws an error. This is the Avro equivalent of "never add a required field" in Protobuf — but enforced structurally through the type system rather than through a convention.

How Does the Reader Get the Writer's Schema?

Three strategies, each suited to a different dataflow pattern:

StrategyHow it worksBest for
Avro container fileThe writer's schema is embedded at the beginning of the file. All records in the file use that schema. One schema per file.Hadoop, data lake files, batch processing
Connection negotiationWriter and reader exchange schemas during connection setup (Avro RPC protocol).RPC between long-lived services
Schema registryWriter stores its schema in a central registry and embeds the schema ID in each message. Reader looks up the schema by ID.Kafka + Avro (the standard pattern)

Avro Schema Resolution Visualizer

The simulation below shows writer and reader schemas side by side. Watch how Avro matches fields by name, fills in defaults for missing fields, and skips fields the reader does not know about.

Avro Schema Resolution

Writer and reader schemas differ. Watch field matching, defaults, and skipping.

Writer schema v1: userName, favoriteNumber. Reader schema v1: same. Click a scenario.

Why Avro? Dynamic Schema Generation

Avro's killer feature is that schemas can be generated programmatically. Consider a database table:

sql
CREATE TABLE users (
  user_id  BIGINT,
  name     VARCHAR(255),
  email    VARCHAR(255)
);

You can automatically generate an Avro schema from this DDL. If someone runs ALTER TABLE users ADD COLUMN phone TEXT;, a new Avro schema is generated automatically. The old and new schemas coexist through Avro's resolution mechanism. No one has to manually assign tag numbers. No one can accidentally reuse a tag. The database schema is the serialization schema.

This is why Avro dominates in the Hadoop ecosystem: data pipelines that export database tables to HDFS can evolve their schemas automatically as the source database evolves.

Avro vs Protobuf: When to Use Which

PropertyAvroProtobuf
Field identityNames (resolved at read time)Numeric tags (embedded in bytes)
Schema in bytes?No — needs writer schema from elsewhereNo — but tags are self-identifying
Dynamic schemasExcellent (generate from DB DDL)Poor (must assign tag numbers manually)
Smallest encodingYes (no tags, no names in bytes)Close (1-2 bytes per field for tag+type)
Random accessNo (must read in schema order)Yes (can skip to any field by tag)
Code generationOptional (reflection works)Primary workflow (protoc generates stubs)
Best forKafka, Hadoop, data pipelinesgRPC, mobile, microservices
python
# Writing and reading Avro with schema evolution
import avro.schema, avro.io, avro.datafile
import io

# Writer schema v1
writer_schema = avro.schema.parse("""
{
  "type": "record", "name": "User",
  "fields": [
    {"name": "userName", "type": "string"},
    {"name": "favoriteNumber", "type": ["null", "long"], "default": null}
  ]
}
""")

# Encode a record
buf = io.BytesIO()
encoder = avro.io.BinaryEncoder(buf)
writer = avro.io.DatumWriter(writer_schema)
writer.write({"userName": "Martin", "favoriteNumber": 1337}, encoder)
raw_bytes = buf.getvalue()
print(f"Encoded: {raw_bytes.hex()} ({len(raw_bytes)} bytes)")
# Encoded: 0c4d617274696e02f214 (10 bytes)

# Reader schema v2 — has an extra field with default
reader_schema = avro.schema.parse("""
{
  "type": "record", "name": "User",
  "fields": [
    {"name": "userName", "type": "string"},
    {"name": "favoriteNumber", "type": ["null", "long"], "default": null},
    {"name": "email", "type": ["null", "string"], "default": null}
  ]
}
""")

# Decode v1 data with v2 reader schema — email gets default
buf = io.BytesIO(raw_bytes)
decoder = avro.io.BinaryDecoder(buf)
reader = avro.io.DatumReader(writer_schema, reader_schema)
record = reader.read(decoder)
print(record)
# {'userName': 'Martin', 'favoriteNumber': 1337, 'email': None}
# email filled with default — backward compatibility works!
Design question: Your data pipeline exports a Postgres table to Avro files in S3 nightly. A developer adds a new NOT NULL column to the table without a default. What breaks, and why?

Chapter 6: Schema Registries

We have seen that Avro needs the writer's schema at read time, and that Protobuf/Thrift need the .proto/.thrift files to decode messages. But how do you distribute these schemas across a distributed system with dozens of services? You cannot email .proto files around. You need a schema registry.

What a Schema Registry Does

A schema registry is a centralized service that stores versioned schemas and enforces compatibility rules. The most widely deployed is the Confluent Schema Registry, designed to work with Apache Kafka. Here is the workflow:

1. Producer writes schema
Producer has schema v2 with a new "email" field. Before sending data, it registers the schema with the registry. The registry checks that v2 is compatible with v1 (using the configured compatibility mode). If compatible, the registry assigns schema ID 47.
2. Producer sends data
Each Kafka message is prefixed with a 5-byte header: a magic byte (0x00) followed by the 4-byte schema ID (47). The rest of the message is the Avro-encoded payload — raw bytes, no schema embedded.
3. Consumer reads data
Consumer extracts schema ID 47 from the message header. It looks up schema 47 from the registry (with caching). Now it has the writer's schema and can perform Avro resolution against its own reader schema.

The 5-byte header is a brilliant design: it adds negligible overhead (5 bytes per message) but gives you full schema evolution support. Compare this to embedding the entire schema in every message (which Avro container files do for HDFS) — for a Kafka topic with millions of small messages, embedding the schema would be catastrophically wasteful.

Compatibility Modes

The registry does not just store schemas — it enforces rules about which schema changes are allowed. There are four compatibility modes:

ModeRuleWhat it allowsUse case
BACKWARDNew schema can read old dataAdd fields with defaults, remove fieldsConsumer upgrades first, then producer
FORWARDOld schema can read new dataRemove fields, add optional fieldsProducer upgrades first, then consumer
FULLBoth backward AND forwardAdd/remove optional fields with defaultsRolling deployments (safest)
NONENo checksAnythingDevelopment only (dangerous in prod)
FULL is the default for good reason. In a microservices environment with rolling deployments, you need both forward and backward compatibility simultaneously. Old consumers must read new data (forward), and new consumers must read old data (backward). FULL compatibility mode enforces both. The only safe schema changes under FULL are: add a field with a default, or remove a field that had a default.

Schema Registry Workflow

The simulation below animates the complete workflow: producer registers schema, registry checks compatibility, producer sends data with schema ID, consumer looks up schema, consumer decodes. Watch what happens when a compatibility violation is attempted.

Schema Registry: Register, Send, Decode

Watch the producer register a schema, send data, and the consumer decode it. Try a breaking change.

Step 1: Producer has schema v1. Click "Next Step" to begin.

Schema Registry in Practice

python
# Producing Avro messages with Confluent Schema Registry
from confluent_kafka import SerializingProducer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer

# Connect to registry
registry = SchemaRegistryClient({'url': 'http://schema-registry:8081'})

# Define schema (or load from file)
schema_str = """
{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "userId", "type": "long"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": ["null", "string"], "default": null}
  ]
}
"""

# Create serializer (auto-registers schema with registry)
avro_serializer = AvroSerializer(registry, schema_str)

producer = SerializingProducer({
    'bootstrap.servers': 'kafka:9092',
    'value.serializer': avro_serializer,
})

# Produce a message — schema ID is prepended automatically
producer.produce(
    topic='users',
    value={'userId': 42, 'name': 'Alice', 'email': 'alice@example.com'}
)
producer.flush()
System design: Your team uses Kafka with the Confluent Schema Registry set to FULL compatibility. A developer submits a schema change that adds a new field age of type int with no default value. What happens?

Chapter 7: Modes of Dataflow

Schema evolution does not exist in a vacuum. The way data flows between processes determines which compatibility properties matter and which encoding formats are practical. There are three fundamental modes of dataflow, each with different implications for encoding and evolution.

Mode 1: Via Databases

When you write to a database, you are encoding data for your future self — or for a colleague who will read it months or years from now. The writer encodes, and a potentially very different reader decodes. This means you need both forward compatibility (old application reading newly-written data) and backward compatibility (new application reading old data that was written years ago).

The most dangerous scenario: a v1 application reads a record written by v2 (which has extra fields), modifies an unrelated field, and writes it back. If the application does not preserve unknown fields, the v2 fields are silently erased. This is called the read-modify-write hazard, and it is the single most common source of silent data corruption in production systems.

Mode 2: Via Service Calls (REST/RPC)

In service-to-service communication, one process (the client) makes a request and another (the server) responds. The client and server may be running different versions of the code.

ProtocolEncodingSchema enforcementEvolution
REST + JSONJSON textNone (OpenAPI is documentation)Ad-hoc versioning (URL path, headers)
gRPCProtobuf binaryStrict (.proto file required)Protobuf evolution rules
Thrift RPCThrift binaryStrict (.thrift file required)Thrift evolution rules
GraphQLJSONSchema (SDL)Add fields freely, deprecate old ones

gRPC (Protobuf over HTTP/2) has become the standard for internal service-to-service communication. It is smaller and faster than REST+JSON, has built-in schema evolution via Protobuf, supports streaming (server-side, client-side, and bidirectional), and generates client/server code automatically from .proto files.

REST for public, gRPC for private. REST + JSON is still the right choice for public APIs consumed by third-party developers (human readability, curl-ability, browser compatibility). gRPC is the right choice for internal microservice communication (performance, schema enforcement, code generation). Many companies run both: a gRPC mesh internally, with a REST gateway at the edge.

Mode 3: Via Message Queues

In asynchronous messaging (Kafka, RabbitMQ, SQS), the producer and consumer are fully decoupled. The producer may have been deployed weeks before the consumer. Messages may sit in the queue for hours or days. The consumer may be running code that was written before the new fields even existed.

This is where schema registries shine. The producer registers its schema (with compatibility checks), embeds the schema ID in each message, and the consumer resolves schemas at read time. Without a registry, you are flying blind — every consumer must somehow know which schema the producer was using, and version mismatches cause silent data corruption.

The Big Simulation: All Three Modes

The simulation below shows data flowing through all three modes simultaneously. You can modify the schema (add a field) and watch how the change propagates. Notice where compatibility breaks happen and where the system handles evolution gracefully.

Three Modes of Dataflow — Schema Change Propagation

Add a field to the schema and watch it propagate through databases, APIs, and message queues. Observe compatibility outcomes.

Schema v1 in all three modes. Click "Add Optional Field" then "Propagate Change" to watch evolution.

The Read-Modify-Write Hazard (Detailed)

This is so important that it deserves a concrete, step-by-step walkthrough:

// Step 1: v2 server writes a record
DB row = {userId: 42, name: "Alice", email: "alice@ex.com", phone: "+1-555-0123"}

// Step 2: v1 server reads the record
v1 code sees: {userId: 42, name: "Alice", email: "alice@ex.com"}
v1 code does NOT see phone (unknown field).

// Step 3: v1 server modifies email and writes back
v1 code writes: {userId: 42, name: "Alice", email: "newemail@ex.com"}

// Step 4: phone is GONE. Silently erased.
// No error, no log, no alert. Just data loss.

The fix depends on the encoding format. Protobuf and Avro decoders can be configured to preserve unknown fields: when v1 code reads a record with unknown tags, it stores the raw bytes in a side buffer and re-emits them on write. This "round-trip safety" is not automatic in all implementations — you must verify that your serialization library supports it.

System design: You are building a microservices platform with 50 services. Internal communication uses gRPC. Event streaming uses Kafka + Avro. External APIs use REST + JSON. A PM asks "why not just use JSON everywhere?" Give the technical argument.

Chapter 8: Dataflow in Practice

Theory is clean. Practice is messy. Let us look at how real systems combine encoding formats, handle versioning, and debug the inevitable breakages.

The Modern Stack

Use caseFormatSchema managementWhy
Service-to-service (internal)gRPC (Protobuf).proto files in monorepo + buf.buildStrict types, code generation, streaming, small payloads
Event streamingKafka + AvroConfluent Schema Registry (FULL mode)Decoupled producers/consumers, automatic schema evolution
Mobile clientsProtobuf.proto files bundled with appSmall payloads over cellular, strict types, backward compat for old app versions
Public REST APIsJSONOpenAPI spec (documentation, not enforcement)Human readable, universal client support, curl-able
Data lake / batchAvro or ParquetSchema embedded in file headerSelf-describing files, columnar for analytics (Parquet)
ConfigurationJSON, YAML, TOMLNone (human-edited)Human readability trumps all other concerns

Design Challenge: Schema Evolution Strategy for 50 Services

Imagine you are the staff engineer responsible for data contracts across a platform of 50 microservices. Here is the system you would design:

1. Schema Repository
All .proto files live in a monorepo (or a dedicated schema repo). Every service that needs a schema imports it from here. buf.build (or protoc with a custom plugin) runs lint + breaking-change detection in CI. A PR that adds a required field is blocked automatically.
2. Code Generation Pipeline
CI generates client/server stubs in every language the platform uses (Go, Python, Java, TypeScript). Generated code is published as versioned packages. Services pin to a specific schema version and upgrade on their own schedule.
3. Compatibility Gate
The schema registry (for Kafka/Avro) and buf breaking (for gRPC/Protobuf) reject incompatible changes before they reach production. This is the single most important automation — it prevents the 3 AM page.
4. Runtime Validation
Each service has middleware that validates incoming messages against the expected schema. Malformed messages are dead-lettered (sent to an error queue) rather than crashing the service. Monitoring dashboards track deserialization error rates per service per topic.

Debug Scenarios

These are the failure modes you will encounter in production (and in interviews):

Scenario 1: "Deserialization failures after deploying a new service version."

Root cause: a required field was added. Old messages in the Kafka topic (or old records in the database) do not have this field. The new code's deserializer throws because the required field is missing. Fix: make the field optional with a default. Redeploy. The error rate drops to zero because the deserializer now fills in the default for old records.

Scenario 2: "Silent data corruption — users report missing phone numbers."

Root cause: a v1 service reads records written by v2, modifies an unrelated field, and writes back without preserving unknown fields. phone_number (tag 3) is silently dropped on every round-trip through v1. Fix: upgrade all services to a Protobuf library version that preserves unknown fields. Add a monitoring check that detects field disappearance (compare field counts before and after writes).

Scenario 3: "Kafka consumer lag spike after producer schema change."

Root cause: the producer registered a new schema that is technically compatible, but the consumer's deserialization code has a bug: it tries to cast the new field to an incompatible type in application code (e.g., the schema says the field is a string, but the consumer's application code tries to parse it as an integer). The schema registry did not catch this because it only checks wire-format compatibility, not application-level semantics. Fix: add integration tests that deserialize sample messages with every consumer. Run these tests in the producer's CI pipeline before the schema change is merged.

Anti-Patterns

gRPC vs REST: A Concrete Comparison

Let us compare the same API call in both formats to make the trade-offs tangible.

protobuf
// gRPC: Define the service in a .proto file
service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (stream User);  // server streaming!
}

message GetUserRequest {
  int64 user_id = 1;
}

message User {
  int64 user_id = 1;
  string name = 2;
  string email = 3;
  optional string phone = 4;
}
python
# gRPC client (auto-generated from .proto)
import grpc
import user_pb2, user_pb2_grpc

channel = grpc.insecure_channel('user-service:50051')
stub = user_pb2_grpc.UserServiceStub(channel)

# Type-safe, IDE-autocompleted, binary-encoded
response = stub.GetUser(user_pb2.GetUserRequest(user_id=42))
print(response.name)   # "Alice"
print(response.phone)  # "" (default, field 4 missing in old data)

# Server streaming: get 10K users without buffering all in memory
for user in stub.ListUsers(user_pb2.ListUsersRequest(limit=10000)):
    process(user)  # each user arrives as it's ready
python
# REST equivalent — compare the developer experience
import requests

# No type safety, no auto-completion, text-encoded
response = requests.get('http://user-service:8080/api/v1/users/42')
data = response.json()  # dict, not a typed object
print(data['name'])    # KeyError if field missing
print(data.get('phone', ''))  # manual default handling

# No streaming: must buffer all 10K users in one JSON array
response = requests.get('http://user-service:8080/api/v1/users?limit=10000')
users = response.json()  # entire 10K-user array in memory at once
DimensiongRPC + ProtobufREST + JSON
Request size (GetUser)~5 bytes (varint 42)~45 bytes (URL + headers)
Response size (1 user)~30 bytes~120 bytes
Type safetyCompile-time (generated stubs)Runtime (manual validation)
StreamingNative (server, client, bidirectional)Workarounds (SSE, chunked, WebSocket)
Schema evolutionProtobuf tag-based rulesAd-hoc URL/header versioning
Human readabilityNo (binary, need grpcurl)Yes (curl, browser, any HTTP tool)
Browser supportLimited (grpc-web proxy needed)Native (fetch, XMLHttpRequest)
Anti-pattern 1: JSON for high-throughput internal communication. A team uses JSON for Kafka messages between internal services. At 100K messages/sec, they are spending 40% of CPU on JSON parsing and allocating 3x more network bandwidth than necessary. Switch to Avro or Protobuf.
Anti-pattern 2: Unversioned APIs. A public API returns JSON with no version indicator. When the team changes a field type from integer to string, every client breaks. The fix is URL versioning (/v1/users, /v2/users) or header versioning (Accept: application/vnd.myapi.v2+json).
Anti-pattern 3: Using the schema registry in NONE compatibility mode in production. "We'll be careful" is not an engineering strategy. FULL mode exists to enforce what "being careful" means. Every schema change that passes FULL is safe by construction.

Format Comparison

The simulation below shows the same 1000-message workload processed by different format combinations. Compare throughput, payload size, and schema safety.

Format Comparison: Throughput, Size, Safety

Compare JSON, MessagePack, Protobuf, and Avro across three dimensions.

Click a metric to compare formats.
Debug scenario: After a Kafka producer deploys a new Avro schema, consumers start logging "Could not find schema ID 147 in registry" errors at a rate of ~5%. The remaining 95% of messages decode fine. What is the most likely cause?

Chapter 9: Interview Arsenal

This chapter distills everything into the cheat sheets, drills, and patterns you need for system design and coding interviews.

Format Cheat Sheet

FormatWire formatSchema?Field identityEvolution mechanismTypical size (our example)
JSONTextNo (optional JSON Schema)String namesAd-hoc (versioned URLs)~45 bytes
XMLTextOptional (XSD)Tag namesXSD versioning~95 bytes
MessagePackBinaryNoString namesSame as JSON~33 bytes
Thrift BinaryBinaryRequired (.thrift)Numeric tagsTag-based skip~35 bytes
Thrift CompactBinaryRequired (.thrift)Numeric tagsTag-based skip~27 bytes
ProtobufBinaryRequired (.proto)Numeric tagsTag-based skip + defaults~11 bytes
AvroBinaryRequired (JSON/IDL)Name resolutionWriter/reader schema match~10 bytes

System Design Questions

Q: "Design a schema evolution strategy for a microservices platform."

Answer framework: (1) Schema repo with CI-enforced compatibility checks (buf breaking for Protobuf, schema registry with FULL mode for Avro/Kafka). (2) Code generation pipeline publishing versioned stubs. (3) Runtime validation middleware with dead-letter queues. (4) Monitoring: deserialization error rate per service per topic. (5) Policy: all new fields optional with defaults, never reuse tag numbers, never change types.

Q: "Compare REST vs gRPC for internal services."

Answer: gRPC wins on all internal metrics: (1) Payload size: Protobuf binary is 4-10x smaller than JSON. (2) Parse speed: binary parsing is ~10x faster. (3) Schema enforcement: .proto files catch type errors at compile time, not runtime. (4) Code generation: stubs in any language for free. (5) Streaming: gRPC supports server/client/bidirectional streaming natively. (6) HTTP/2: multiplexed connections, header compression. REST wins for external APIs: human readable, universally supported, curl-testable, no build step for consumers.

Q: "A Kafka consumer is deserializing messages 10x slower than expected. Diagnose."

Answer framework: (1) Check if the consumer is hitting the schema registry on every message instead of caching. (2) Check if the consumer is using reflection-based deserialization instead of generated code. (3) Profile the deserialization: is it CPU-bound (parsing) or memory-bound (allocations)? (4) Check if the messages contain deeply nested Avro unions that require extensive resolution. (5) Compare against a baseline: deserialize 1M messages of the same schema and measure throughput.

Coding Drills

Drill 1: Write a Protobuf schema with evolution.

protobuf
// v1: initial schema
syntax = "proto3";

message UserEvent {
  int64 user_id = 1;
  string action = 2;     // "login", "purchase", etc.
  int64 timestamp = 3;    // Unix millis
}

// v2: add optional metadata field
message UserEvent {
  int64 user_id = 1;
  string action = 2;
  int64 timestamp = 3;
  string ip_address = 4;  // NEW: optional (proto3 default)
  map<string, string> metadata = 5;  // NEW: flexible key-value
}

// v3: deprecate a field, add another
message UserEvent {
  int64 user_id = 1;
  string action = 2;
  int64 timestamp = 3;
  reserved 4;             // ip_address removed, tag 4 reserved forever
  reserved "ip_address";  // name also reserved
  map<string, string> metadata = 5;
  string session_id = 6; // NEW field gets NEW tag number
}

Drill 2: Implement varint encoder/decoder.

python
def encode_varint(n):
    """Encode unsigned integer as Protobuf varint bytes."""
    buf = []
    while n > 0x7F:
        buf.append((n & 0x7F) | 0x80)
        n >>= 7
    buf.append(n)
    return bytes(buf)

def decode_varint(data):
    """Decode varint from bytes. Returns (value, bytes_consumed)."""
    result = 0
    shift = 0
    for i, byte in enumerate(data):
        result |= (byte & 0x7F) << shift
        if not (byte & 0x80):
            return result, i + 1
        shift += 7
    raise ValueError("Varint not terminated")

def zigzag_encode(n):
    return (n << 1) ^ (n >> 63)

def zigzag_decode(n):
    return (n >> 1) ^ -(n & 1)

# Test round-trips
for v in [0, 1, 127, 128, 300, 1337, 1000000]:
    encoded = encode_varint(v)
    decoded, length = decode_varint(encoded)
    assert decoded == v, f"Failed for {v}"
    print(f"{v:>10} -> {encoded.hex():12} ({length} bytes)")

# Output:
#          0 -> 00           (1 bytes)
#          1 -> 01           (1 bytes)
#        127 -> 7f           (1 bytes)
#        128 -> 8001         (2 bytes)
#        300 -> ac02         (2 bytes)
#       1337 -> b90a         (2 bytes)
#    1000000 -> c0843d       (3 bytes)

Drill 3: Decode a Protobuf Message by Hand

This is the kind of question a senior interviewer at Google or Meta might ask. You are given raw Protobuf bytes and a schema. Decode the message.

// Given schema:
message Order {
  int64 order_id = 1;
  string product = 2;
  int32 quantity = 3;
}

// Given bytes (hex): 08 96 01 12 05 50 69 7A 7A 61 18 02

// Step 1: Parse first byte 0x08
0x08 = 0000 1000 → field_number = 08 >> 3 = 1, wire_type = 08 & 7 = 0 (varint)
// Field 1 (order_id), wire type varint

// Step 2: Parse varint 0x96 0x01
0x96 = 1|0010110 → MSB=1, data=0010110 (more bytes follow)
0x01 = 0|0000001 → MSB=0, data=0000001 (last byte)
Value = 0000001_0010110 = 150 (decimal)
// order_id = 150

// Step 3: Parse byte 0x12
0x12 = 0001 0010 → field_number = 18 >> 3 = 2, wire_type = 18 & 7 = 2 (length-delimited)
// Field 2 (product), wire type length-delimited

// Step 4: Parse length 0x05 = 5 bytes, then read 5 bytes
50 69 7A 7A 61 = "Pizza" in UTF-8
// product = "Pizza"

// Step 5: Parse byte 0x18
0x18 = 0001 1000 → field_number = 24 >> 3 = 3, wire_type = 24 & 7 = 0 (varint)
// Field 3 (quantity), wire type varint

// Step 6: Parse varint 0x02 = 2
// quantity = 2

// Result: {order_id: 150, product: "Pizza", quantity: 2}
// Total: 12 bytes. JSON equivalent: ~50 bytes.

The Five Interview Dimensions Applied

DimensionExample questionWhat a staff answer includes
Concept"Explain forward vs backward compatibility"Concrete rolling deployment scenario, not just definitions. "During a 30-minute deploy, server 3 is v2 but server 5 is still v1..."
Design"Design a data contract system for 50 services"Schema repo + CI + registry + dead-letter queues + monitoring. Time horizons: this quarter, next generation, hardware transition.
Code"Implement a varint encoder"Clean code + edge cases (value=0, max int64) + test cases + complexity analysis (O(log n) bytes)
Debug"Deserialization errors after deploy"Systematic: check schema change → check compatibility mode → check if required field added → check consumer cache → check schema registry replication
Frontier"What's next after Protobuf?"Cap'n Proto (zero-copy), FlatBuffers (random access without parsing), Proto Editions (Google's next-gen), Apache Arrow (columnar in-memory format)
Staff-level design: You're building a platform where mobile apps (updated monthly), backend services (deployed daily), and batch pipelines (run weekly on historical data) all share the same user event schema. Which encoding format and evolution strategy would you choose, and why?

Chapter 10: Connections

Encoding and evolution do not exist in isolation. They interact deeply with every other layer of a data-intensive system. Here is where this chapter connects to the rest of the DDIA landscape.

Where Encoding Fits

Related topicConnectionLesson
Data Models (Ch 3)The data model determines what you serialize. Relational tables map naturally to Avro schemas. Document models (JSON) map to... JSON. Graph models need custom serialization for edges and vertices.Data Models & Query Languages
Storage & Retrieval (Ch 4)Storage engines use encoding internally. LSM-trees write SSTables in a binary format. B-trees store fixed-size pages. Column stores use specialized compression (run-length, bitmap, dictionary) that is itself a form of encoding.Storage & Retrieval
Replication (Ch 6)Replication logs must be encoded in a format that supports schema evolution. A follower running an older version must be able to read the leader's replication stream (forward compatibility). Statement-based replication encodes SQL text. Row-based replication encodes binary tuples.Replication
Partitioning (Ch 7)Partitioning keys must survive encoding round-trips. If your partition key is a 64-bit integer and your encoding truncates it to 32 bits, records will hash to the wrong partition.Partitioning
Stream Processing (Ch 12)Kafka + Avro + Schema Registry is the dominant pattern for stream processing. Every concept from this chapter — schema evolution, compatibility modes, schema resolution — is the foundation of a reliable streaming platform.Stream Processing

The Encoding Decision Tree

When choosing a format for a new system, walk this tree:

Is the consumer a web browser or third-party developer?
Yes → JSON (with OpenAPI spec for documentation). Human readability and universal support trump all other concerns.
↓ No
Do you need the data to be self-describing (no shared schema)?
Yes → MessagePack (binary JSON) or Avro container files (binary with embedded schema). Both are language-independent and self-contained.
↓ No
Is the data flowing through Kafka or a message queue?
Yes → Avro + Schema Registry. The registry provides schema discovery and compatibility enforcement for decoupled producers/consumers.
↓ No
Is it service-to-service RPC?
Yes → gRPC (Protobuf). Code generation, streaming, strict types, compact binary encoding. The modern standard for internal microservice communication.

What We Did Not Cover

This chapter focused on the fundamental concepts. More advanced topics include:

The Feynman test. If someone wakes you at 3 AM and asks "why does your service use Protobuf internally and JSON externally?" you should be able to answer in one sentence: "Protobuf gives us schema-enforced types, 4x smaller payloads, and guaranteed forward/backward compatibility for rolling deploys, while JSON gives external developers human readability and universal client support." If you can say that, you have internalized this chapter.

"The key idea is that old and new code, old and new data, must coexist. Encoding formats are the contracts that make coexistence possible." — Martin Kleppmann, DDIA

Final synthesis: A startup uses JSON everywhere — REST APIs, Kafka messages, database storage (Postgres JSONB columns). They have 12 services and are growing to 50. What is the single highest-impact change they should make first?