Introduction to Algorithms (CLRS) — Chapter 31

Number-Theoretic Algorithms

GCD, modular arithmetic, RSA, primality testing — the mathematics that secures the internet.

Prerequisites: Basic algebra + Division with remainder. That's it.
10
Chapters
9+
Simulations
5
Coding Drills

Chapter 0: The Problem

You type your credit card number into a website. That number travels across the open internet — through dozens of routers, across undersea cables, past anyone who might be eavesdropping. Yet somehow, only the website can read it. How?

The answer is public-key cryptography, and its most famous incarnation is the RSA cryptosystem (named after Rivest, Shamir, and Adleman, who published it in 1977 — though Clifford Cocks at GCHQ discovered it secretly in 1973). RSA rests on a beautiful asymmetry: multiplying two large prime numbers together is trivially easy (a laptop can multiply two 1000-digit primes in milliseconds), but factoring the product back into those two primes is, as far as anyone knows, computationally infeasible. A 2048-bit RSA key would take all the computers on Earth longer than the age of the universe to crack by brute force.

Before RSA, secure communication required that both parties share a secret key in advance (a symmetric key). This was the chicken-and-egg problem of cryptography: to communicate securely you need a shared key, but to share a key securely you need secure communication. Governments used couriers with briefcases handcuffed to their wrists. Businesses used sealed envelopes. There was no way for two strangers on opposite sides of the internet to establish a shared secret.

Public-key cryptography solved this. Each person generates a key pair: a public key (which they publish to the world) and a private key (which they keep secret). Anyone can encrypt a message using the public key, but only the holder of the private key can decrypt it. This is like mailing someone a padlock — they can lock something in a box and send it back, but only you can open it.

This entire asymmetry — easy to multiply, hard to factor — is built on number theory: the study of integers and their properties. Euclid's algorithm (circa 300 BC) is still used in every RSA implementation today. Fermat's little theorem (1640) underpins every primality test your browser runs. Modular arithmetic — "clock arithmetic" — is the language these algorithms speak.

The core idea. Number theory gives us trapdoor functions: operations that are easy to compute in one direction but practically impossible to reverse without a secret key. Multiplying primes is easy. Factoring is hard. Modular exponentiation is easy. Discrete logarithms are hard. These one-way functions are the foundation of internet security. This lesson builds every piece from scratch.

The simulation below shows the fundamental asymmetry. Click "Multiply" to watch two primes combine instantly. Click "Factor" to watch a brute-force search struggle with even modest numbers.

The Factoring Asymmetry

Multiply two primes: instant. Factor the product: exponentially hard.

Prime size (digits) 2

With 2-digit primes, factoring is quick — maybe a few hundred trial divisions. With 3-digit primes, it takes thousands. With 5-digit primes, it takes millions. Real RSA uses primes with 300+ digits (1024 bits). The number of trial divisions needed is roughly √n, which for a 600-digit number is 10300. For reference, the observable universe contains about 1080 atoms.

Let us put this concretely. Here are the best known factoring speeds for different key sizes:

RSA Key SizeProduct n (digits)Best Attack TimeStatus
512 bits~155 digits~8 months (1999)Broken
768 bits~232 digits~2 years (2009)Broken
1024 bits~309 digitsFeasible for nation-statesDeprecated
2048 bits~617 digitsBeyond current capabilityStandard
4096 bits~1234 digitsBeyond foreseeable capabilityHigh security

The best known classical algorithm for factoring is the General Number Field Sieve (GNFS), which runs in sub-exponential time: approximately exp(1.9 · (ln n)1/3 · (ln ln n)2/3). This is much faster than trial division's O(√n) = O(n1/2), but still far too slow for 2048-bit numbers. The key insight: factoring is hard not because we lack clever algorithms, but because even the cleverest known algorithms scale exponentially in the number of digits.

Here is our roadmap. We start with the greatest common divisor and Euclid's elegant algorithm (Chapter 1). We build up modular arithmetic — the "clock math" that makes RSA work (Chapter 2). We construct the RSA cryptosystem from scratch (Chapter 3). We learn to test whether a number is prime without factoring it (Chapter 4). We combine everything into a full cryptographic showcase (Chapter 5). Then we cover the Chinese Remainder Theorem (Chapter 6), real-world applications (Chapter 7), interview problems (Chapter 8), and connections to the rest of the CLRS universe (Chapter 9).

Concept check: Why can't an eavesdropper who intercepts an RSA-encrypted message simply factor the public key to recover the private key?

Chapter 1: GCD & Euclid's Algorithm

Before we can build RSA, we need one fundamental tool: computing the greatest common divisor (GCD) of two integers. The GCD of a and b, written gcd(a, b), is the largest integer that divides both a and b. For example, gcd(12, 8) = 4 because 4 is the largest number that divides both 12 and 8.

Why do we need GCD for cryptography? Because RSA key generation requires finding a number e that is coprime to φ(n) — meaning gcd(e, φ(n)) = 1. And computing the private key d requires the extended Euclidean algorithm, which finds x and y such that ax + by = gcd(a, b). This equation is called Bézout's identity, and it is the backbone of modular inverses.

Beyond cryptography, GCD appears everywhere: simplifying fractions (gcd(numerator, denominator)), computing least common multiples (lcm(a,b) = ab/gcd(a,b)), solving linear Diophantine equations (ax + by = c has solutions if and only if gcd(a,b) divides c), and even in the analysis of the Euclidean algorithm for computing continued fractions.

Euclid's algorithm is arguably the oldest non-trivial algorithm still in daily use — over 2300 years old and still optimal. Donald Knuth called it "the granddaddy of all algorithms," not just because of its age, but because it established a pattern: a simple recursive structure that provably converges, with tight analysis of its running time. The Fibonacci worst-case analysis (gcd(Fn, Fn-1) takes n-2 steps) connects number theory to one of the most famous sequences in mathematics.

The Naive Approach

You could find gcd(a, b) by listing all divisors of both numbers and picking the largest common one. For gcd(48, 18): divisors of 48 are {1,2,3,4,6,8,12,16,24,48}, divisors of 18 are {1,2,3,6,9,18}, common divisors are {1,2,3,6}, so gcd = 6. This works but takes O(min(a,b)) time — terrible for 300-digit numbers.

Euclid's Algorithm

Around 300 BC, Euclid discovered something beautiful. The GCD has a recursive property:

gcd(a, b) = gcd(b, a mod b)

Why does this work? If d divides both a and b, then d also divides a - qb = a mod b (where q = ⌊a/b⌋). So every common divisor of (a, b) is also a common divisor of (b, a mod b), and vice versa. The sets of common divisors are identical, so their maximums are identical.

The base case: gcd(a, 0) = a. When the second argument reaches 0, the first argument is the GCD.

Key insight: logarithmic convergence. Each step of Euclid's algorithm reduces the smaller number by at least half. After two steps, both numbers have shrunk. The total number of steps is at most 2 · log2(min(a,b)) — that is O(log min(a,b)). For 300-digit numbers, that is about 2000 steps. Euclid's algorithm is astonishingly efficient.

Why does the smaller number shrink by at least half? Consider gcd(a, b) with a ≥ b. If b ≤ a/2, then after one step we have (b, a mod b) and the larger argument went from a to b ≤ a/2. If b > a/2, then a mod b = a - b < a/2, so after one step we have (b, a-b) where a-b < a/2, and the next step gives (a-b, b mod (a-b)) which again has shrunk. Either way, after two steps the max of the pair has halved. The worst case is consecutive Fibonacci numbers: gcd(Fn, Fn-1) takes exactly n-2 steps. This is because Fibonacci numbers are the slowest to converge under Euclid's algorithm — each quotient is exactly 1, so the remainder shrinks as slowly as possible.

Lamé's Theorem (1844). The number of steps in Euclid's algorithm on inputs (a, b) with a ≥ b ≥ 1 is at most 5 times the number of decimal digits of b. So for b with 100 digits, at most 500 division steps. This was one of the first results in computational complexity theory — decades before computers existed.

Worked Example: gcd(252, 105)

Step 1
gcd(252, 105): 252 = 2 × 105 + 42, so gcd(252,105) = gcd(105, 42)
Step 2
gcd(105, 42): 105 = 2 × 42 + 21, so gcd(105,42) = gcd(42, 21)
Step 3
gcd(42, 21): 42 = 2 × 21 + 0, so gcd(42,21) = gcd(21, 0)
Done
gcd(21, 0) = 21. So gcd(252, 105) = 21

The simulation below animates each step. Enter two numbers and watch the algorithm unwind.

Euclid's Algorithm — Animated

Enter two positive integers and watch the division steps. The algorithm traces down the left, then the extended coefficients trace back up the right.

The Extended Euclidean Algorithm

Euclid's algorithm tells us the GCD. The extended version also finds integers x and y such that:

ax + by = gcd(a, b)

This is Bézout's identity. It always has a solution, and the extended algorithm finds it by "unwinding" the recursion. At each step, we express the remainder as a linear combination of a and b.

Worked example: find x, y such that 252x + 105y = gcd(252, 105) = 21.

Step 3: 21 = 105 - 2 × 42 // from step 2 Step 2: 42 = 252 - 2 × 105 // from step 1 Substitute step 2 into step 3: 21 = 105 - 2 × (252 - 2 × 105) = 105 - 2 × 252 + 4 × 105 = 5 × 105 - 2 × 252 So x = -2, y = 5. Check: 252 × (-2) + 105 × 5 = -504 + 525 = 21 ✓
Why extended Euclid matters for RSA. To compute the RSA private key d, we need the modular inverse of e modulo φ(n) — that is, d such that e · d ≡ 1 (mod φ(n)). This is exactly Bézout's identity: e · d + φ(n) · k = 1 = gcd(e, φ(n)). The extended Euclidean algorithm gives us d directly.

Iterative Extended Euclid

The recursive version is elegant but uses O(log n) stack frames. For very large numbers (1000+ digits), stack overflow is a risk. Here is the iterative version that uses constant extra space:

python
def extended_gcd_iter(a, b):
    """Iterative extended Euclid. Returns (gcd, x, y).
    Constant space, no recursion."""
    old_r, r = a, b
    old_s, s = 1, 0    # x coefficients
    old_t, t = 0, 1    # y coefficients

    while r != 0:
        q = old_r // r
        old_r, r = r, old_r - q * r
        old_s, s = s, old_s - q * s
        old_t, t = t, old_t - q * t

    return old_r, old_s, old_t  # gcd, x, y

# Trace for gcd(252, 105):
# q=2: r=42,  s=-2, t=1
# q=2: r=21,  s=5,  t=-2
# q=2: r=0    → gcd=21, x=-2, y=5

Application: Solving Linear Congruences

The equation ax ≡ b (mod n) has solutions if and only if g = gcd(a, n) divides b. If it does, there are exactly g solutions modulo n. To find them: first solve ax ≡ g (mod n) using extended Euclid, then multiply by b/g.

Example: solve 14x ≡ 6 (mod 22). gcd(14, 22) = 2, and 2 divides 6, so solutions exist. First solve 14x ≡ 2 (mod 22): extended Euclid gives x0 = -3 ≡ 19 (mod 22). Then 14 × 19 = 266 = 12 × 22 + 2. ✓. Multiply by 6/2 = 3: x = 19 × 3 = 57 ≡ 13 (mod 22). Check: 14 × 13 = 182 = 8 × 22 + 6. ✓. The two solutions mod 22 are x = 13 and x = 13 + 11 = 2 (since n/g = 22/2 = 11). Check: 14 × 2 = 28 = 1 × 22 + 6. ✓.

python
def gcd(a, b):
    """Euclid's algorithm: O(log min(a,b))"""
    while b != 0:
        a, b = b, a % b
    return a

def extended_gcd(a, b):
    """Returns (gcd, x, y) such that a*x + b*y = gcd(a,b)"""
    if b == 0:
        return a, 1, 0   # base case: a*1 + 0*0 = a
    else:
        g, x1, y1 = extended_gcd(b, a % b)
        # From: b*x1 + (a%b)*y1 = g
        # Since a%b = a - (a//b)*b:
        #   b*x1 + (a - (a//b)*b)*y1 = g
        #   a*y1 + b*(x1 - (a//b)*y1) = g
        return g, y1, x1 - (a // b) * y1

# Example
g, x, y = extended_gcd(252, 105)
print(f"gcd={g}, x={x}, y={y}")   # gcd=21, x=-2, y=5
print(f"Check: 252*{x} + 105*{y} = {252*x + 105*y}")  # 21

def mod_inverse(e, phi):
    """Compute d = e^(-1) mod phi using extended Euclid."""
    g, x, _ = extended_gcd(e, phi)
    if g != 1:
        raise ValueError("No inverse: gcd(e, phi) != 1")
    return x % phi  # ensure positive
Concept check: What is gcd(48, 18)?

Chapter 2: Modular Arithmetic

You already know modular arithmetic — you just call it "clock math." If it is 10 o'clock and you wait 5 hours, it is 3 o'clock, not 15 o'clock. Your clock wraps around at 12. That wrapping is modular arithmetic: 10 + 5 ≡ 3 (mod 12).

Formally: we say a ≡ b (mod n) (read "a is congruent to b modulo n") if n divides (a - b). Equivalently, a and b have the same remainder when divided by n. The number n is called the modulus.

Modular Operations

The beauty of modular arithmetic is that you can reduce at every step. You never need to work with numbers bigger than n2:

(a + b) mod n = ((a mod n) + (b mod n)) mod n
(a × b) mod n = ((a mod n) × (b mod n)) mod n

This means we can compute things like 7256 mod 13 without ever computing 7256 (a 217-digit number). We just reduce after each multiplication. This is the reason modular arithmetic is so powerful in computation — it keeps numbers bounded.

However, subtraction and division require care. Subtraction works the same way: (a - b) mod n = ((a mod n) - (b mod n) + n) mod n. The +n prevents negative results. But division is different: there is no "modular division" operator. Instead, we multiply by the modular inverse: a/b mod n = a × b-1 mod n. The inverse b-1 exists only when gcd(b, n) = 1 (more on this shortly).

Here is a concrete example showing why reducing at each step matters:

Task: compute 123 × 456 × 789 mod 1000 Without modular reduction: 123 × 456 = 56,088 56,088 × 789 = 44,253,432 44,253,432 mod 1000 = 432 With modular reduction at each step: 123 × 456 = 56,088 mod 1000 = 88 88 × 789 = 69,432 mod 1000 = 432 // same answer, smaller intermediates!

In the first approach, the intermediate value reaches 44 million. In the second, it never exceeds about 70,000. For real cryptographic computations with 2048-bit numbers, the intermediates without reduction would have millions of digits. With reduction, they never exceed 2 × 2048 = 4096 bits.

But even better, there is a trick called repeated squaring (also called fast modular exponentiation) that computes ab mod n in only O(log b) multiplications.

Fast Modular Exponentiation (Repeated Squaring)

The idea: express the exponent b in binary. Then ab is just a product of powers of the form a2k, which we compute by repeatedly squaring.

Worked example: compute 713 mod 11.

13 in binary = 1101 // 8 + 4 + 1 So 713 = 78 × 74 × 71 Start: result = 1, base = 7 Bit 0 (1): result = 1 × 7 = 7 mod 11 = 7. base = 72 = 49 mod 11 = 5 Bit 1 (0): skip. base = 52 = 25 mod 11 = 3 Bit 2 (1): result = 7 × 3 = 21 mod 11 = 10. base = 32 = 9 mod 11 = 9 Bit 3 (1): result = 10 × 9 = 90 mod 11 = 2. base = 92 = 81 mod 11 = 4 Answer: 713 mod 11 = 2

Only 4 multiplications instead of 12. For a 2048-bit exponent, that is about 2048 squarings instead of 22048 multiplications. This is how RSA encryption and decryption remain fast even with enormous exponents.

Fermat's Little Theorem. If p is prime and a is not divisible by p, then ap-1 ≡ 1 (mod p). This is the foundation of primality testing (Chapter 4) and also explains why RSA decryption recovers the original message. Specifically, it guarantees that (me)d ≡ m (mod n) when e and d are chosen correctly.

Modular Inverse

The modular inverse of a modulo n is a number a-1 such that a · a-1 ≡ 1 (mod n). Not every number has an inverse. The inverse of a mod n exists if and only if gcd(a, n) = 1 (they are coprime). When it exists, extended Euclid computes it.

Quick example: what is 3-1 mod 7? We need x such that 3x ≡ 1 (mod 7). Try x = 5: 3 × 5 = 15 = 2 × 7 + 1. Yes! So 3-1 ≡ 5 (mod 7). When the modulus is prime, there is a shortcut via Fermat: a-1 ≡ ap-2 (mod p). So 3-1 ≡ 35 ≡ 243 ≡ 5 (mod 7). Same answer.

The Group Zn*

The set of integers in {1, 2, ..., n-1} that are coprime to n forms a multiplicative group under multiplication mod n, denoted Zn*. This group has φ(n) elements (Euler's totient). When n is prime, every non-zero element has an inverse, so Zp* = {1, 2, ..., p-1} and φ(p) = p-1.

This algebraic structure is the engine behind RSA. Euler's theorem generalizes Fermat: for any a with gcd(a, n) = 1, aφ(n) ≡ 1 (mod n). When n = pq (product of two primes), φ(n) = (p-1)(q-1), and the RSA proof falls out directly from this group-theoretic fact.

Generators and Primitive Roots

A generator (or primitive root) of Zp* is an element g whose powers generate every element of the group: {g1, g2, ..., gp-1} = {1, 2, ..., p-1} (mod p). Not every element is a generator. For p = 7, the element g = 3 is a generator:

31 mod 7 = 3 32 mod 7 = 2 33 mod 7 = 6 34 mod 7 = 4 35 mod 7 = 5 36 mod 7 = 1 (back to start — full cycle!) Powers of 3 give {3, 2, 6, 4, 5, 1} = {1, 2, 3, 4, 5, 6} = Z7* ✓

Generators are crucial for Diffie-Hellman (Chapter 5) — the protocol requires a generator g of Zp* so that ga and gb can produce any element. By a theorem of Gauss, Zp* always has primitive roots (for prime p), and there are exactly φ(p-1) of them. For p = 7, φ(6) = 2, so there are 2 generators: g = 3 and g = 5.

Euler's Totient Function

Euler's totient φ(n) counts the integers in {1, ..., n} that are coprime to n. It has a beautiful closed-form based on the prime factorization of n:

φ(n) = n × ∏p | n (1 - 1/p)

Examples: φ(12) = 12 × (1 - 1/2) × (1 - 1/3) = 12 × 1/2 × 2/3 = 4. The four integers coprime to 12 in {1,...,12} are {1, 5, 7, 11}. For prime p: φ(p) = p - 1 (everything except 0 is coprime to p). For n = pq with distinct primes: φ(pq) = (p-1)(q-1) — this is exactly the formula RSA uses.

A beautiful identity: for any positive integer n, the sum of φ(d) over all divisors d of n equals n. For example, the divisors of 12 are {1, 2, 3, 4, 6, 12}, and φ(1) + φ(2) + φ(3) + φ(4) + φ(6) + φ(12) = 1 + 1 + 2 + 2 + 2 + 4 = 12. This identity has a beautiful combinatorial proof: partition the fractions {1/n, 2/n, ..., n/n} by their reduced denominators.

Order and Discrete Logarithms

The order of a modulo n, denoted ordn(a), is the smallest positive integer k such that ak ≡ 1 (mod n). By Lagrange's theorem, the order always divides φ(n). For example, ord7(2) = 3 because 21=2, 22=4, 23=8≡1 (mod 7), and 3 divides φ(7)=6.

The discrete logarithm of b to base g modulo n is the unique integer x with 0 ≤ x < ordn(g) such that gx ≡ b (mod n). Finding discrete logarithms is believed to be computationally hard for well-chosen groups — this is the basis of Diffie-Hellman security. The best known algorithm (number field sieve for discrete logs) runs in sub-exponential time, similar to factoring.

Repeated Squaring: Step by Step Visualization

The visualization below shows the repeated squaring process. Enter base, exponent, and modulus, and watch each step: the binary decomposition of the exponent, each squaring, and each conditional multiplication.

Repeated Squaring — Step by Step

Enter base, exponent, and modulus. The algorithm decomposes the exponent into binary and processes each bit.

The clock visualization below makes modular arithmetic tangible. Set the modulus and watch how addition and multiplication wrap around.

Modular Arithmetic Clock

Choose a modulus, then compute a op b mod n. Watch the result on the clock face.

python
def mod_exp(base, exp, mod):
    """Compute base^exp mod mod using repeated squaring. O(log exp)."""
    result = 1
    base = base % mod
    while exp > 0:
        if exp % 2 == 1:        # current bit is 1
            result = (result * base) % mod
        exp = exp >> 1             # shift right (divide by 2)
        base = (base * base) % mod  # square the base
    return result

# Example: 7^13 mod 11
print(mod_exp(7, 13, 11))  # 2

# Verify Fermat's Little Theorem: a^(p-1) ≡ 1 (mod p) for prime p
p = 17
for a in range(1, p):
    assert mod_exp(a, p - 1, p) == 1, f"Fermat failed for a={a}"
print("Fermat's Little Theorem verified for p=17")
Concept check: How many multiplications does repeated squaring need to compute a1000 mod n?

Chapter 3: RSA Cryptosystem

Now we have all the tools: GCD, extended Euclid, modular arithmetic, fast exponentiation. Let us build the most influential cryptosystem ever invented.

RSA (Rivest-Shamir-Adleman, 1977) works like a padlock. Anyone can snap the padlock shut (encrypt with the public key), but only the person with the key can open it (decrypt with the private key). Here is the construction, step by step.

Step 1: Key Generation

Pick primes
Choose two large primes p and q (in practice, 512-1024 bits each)
Compute n
n = p × q. This is the modulus. Its bit length is the "key size" (e.g. 2048 bits)
Compute φ(n)
φ(n) = (p-1)(q-1) — Euler's totient: count of integers in [1,n] coprime to n
Choose e
Pick e such that 1 < e < φ(n) and gcd(e, φ(n)) = 1. Common choice: e = 65537
Compute d
d = e-1 mod φ(n) — the modular inverse via extended Euclid
Keys ready
Public key: (n, e). Private key: d. Destroy p, q, φ(n).

Step 2: Encryption

To encrypt a message m (a number with 0 ≤ m < n):

c = me mod n

This uses fast modular exponentiation — O(log e) multiplications mod n. Even with e = 65537 (a 17-bit number), that is only about 17 squarings.

Step 3: Decryption

To decrypt ciphertext c:

m = cd mod n

Why does this recover m? Because cd = (me)d = med. Since ed ≡ 1 (mod φ(n)), we have ed = 1 + kφ(n) for some integer k. By Euler's theorem (the generalization of Fermat's little theorem): mφ(n) ≡ 1 (mod n) when gcd(m, n) = 1. Therefore:

med = m1 + kφ(n) = m × (mφ(n))k ≡ m × 1k = m (mod n)
The RSA trapdoor. Anyone can compute c = me mod n (they know n and e — the public key). But to reverse this, you need d, and computing d requires φ(n) = (p-1)(q-1), which requires knowing p and q. To find p and q from n requires factoring n. For large n, this is computationally infeasible. That is the trapdoor.

Why Destroy p, q, and φ(n)?

After computing d, the key generator must erase p, q, and φ(n) from memory. Why? Because if an attacker obtains any one of these, RSA collapses:

In practice, the private key is stored in a hardware security module (HSM) or encrypted on disk. The primes are generated, used, and immediately wiped. This is why "key generation" is a ceremony in high-security settings — performed on air-gapped machines that are physically destroyed afterward.

Padding: Why Textbook RSA Is Insecure

The RSA we described is called "textbook RSA" and is not secure for direct use. The problem: it is deterministic. Encrypting the same message m twice gives the same ciphertext c. An attacker who guesses the message can verify their guess by encrypting it with the public key and comparing.

Real RSA uses padding schemes like OAEP (Optimal Asymmetric Encryption Padding) that add randomness before encryption. OAEP hashes the message with random data, making each encryption unique. The padded message is then encrypted with textbook RSA. Without padding, RSA is vulnerable to chosen-ciphertext attacks, small-message attacks (me < n), and more.

In practice. You almost never encrypt data directly with RSA. RSA is too slow for bulk encryption (about 1000x slower than AES). Instead, RSA encrypts a random session key, and the session key encrypts the actual data with a fast symmetric cipher like AES-256-GCM. This is called hybrid encryption and is what TLS actually does.

Worked Example with Small Primes

Key generation: p = 61, q = 53 n = 61 × 53 = 3233 φ(n) = 60 × 52 = 3120 Choose e = 17 (check: gcd(17, 3120) = 1 ✓) d = 17-1 mod 3120 = 2753 (verify: 17 × 2753 = 46801 = 15 × 3120 + 1 ✓) Encryption: message m = 65 (the letter 'A') c = 6517 mod 3233 = 2790 Decryption: m = 27902753 mod 3233 = 65 ✓

The interactive demo below lets you generate RSA keys with small primes, encrypt a number, and decrypt it. Watch every step of the math.

Interactive RSA — Key Generation, Encryption, Decryption

Pick two primes, generate keys, then encrypt and decrypt a message (number).

python
def rsa_keygen(p, q):
    """Generate RSA key pair from primes p, q."""
    n = p * q
    phi = (p - 1) * (q - 1)

    # Choose e coprime to phi. 65537 is standard if it works.
    e = 65537
    if gcd(e, phi) != 1:
        # Fallback: find smallest e > 1 coprime to phi
        e = 3
        while gcd(e, phi) != 1:
            e += 2

    d = mod_inverse(e, phi)   # from extended Euclid
    return (n, e), d           # public_key, private_key

def rsa_encrypt(m, public_key):
    n, e = public_key
    return mod_exp(m, e, n)    # c = m^e mod n

def rsa_decrypt(c, d, n):
    return mod_exp(c, d, n)    # m = c^d mod n

# Full example
pub, priv = rsa_keygen(61, 53)
print(f"Public: n={pub[0]}, e={pub[1]}")
print(f"Private: d={priv}")

c = rsa_encrypt(65, pub)
print(f"Encrypted: {c}")

m = rsa_decrypt(c, priv, pub[0])
print(f"Decrypted: {m}")  # 65

# Encrypt a string (character by character)
def rsa_encrypt_string(msg, public_key):
    return [rsa_encrypt(ord(ch), public_key) for ch in msg]

def rsa_decrypt_string(ciphertext, d, n):
    return ''.join(chr(rsa_decrypt(c, d, n)) for c in ciphertext)

ct = rsa_encrypt_string("HELLO", pub)
print(f"Ciphertext: {ct}")
print(f"Decrypted: {rsa_decrypt_string(ct, priv, pub[0])}")  # HELLO
Security parameter. The security of RSA is measured by the bit-length of n. Current recommendations: 2048 bits minimum (equivalent to about 112 bits of symmetric security), 3072 bits for medium-term (128-bit security), 4096 bits for long-term. The primes p and q should be roughly equal in bit-length (so each is about half the key size) and should not be too close together (|p - q| should be large) to resist Fermat's factoring method.

RSA Key Generation: Finding Large Primes

How does RSA actually find 512-bit primes? The process is deceptively simple:

Generate random
Pick a random 512-bit odd number r
Trial division
Check if r is divisible by small primes (2, 3, 5, ..., up to ~1000). This eliminates ~88% of candidates cheaply.
Miller-Rabin
Run 40 rounds of Miller-Rabin. If it passes, accept r as prime.
↻ If composite, try r + 2

By the Prime Number Theorem, the density of primes near N is approximately 1/ln(N). For 512-bit numbers, ln(2512) ≈ 355, so roughly 1 in every 355 odd numbers is prime. On average, we test about 178 candidates before finding a prime. With efficient Miller-Rabin (milliseconds per test), the entire process takes under a second.

The trial division pre-filter is critical for efficiency. Without it, every candidate would require the full (expensive) Miller-Rabin test. With it, 88% of candidates are eliminated by a handful of cheap divisions. This is a pattern you see throughout algorithms: cheap approximate filters before expensive exact tests.

python
import random

def generate_prime(bits):
    """Generate a random prime with the specified number of bits."""
    small_primes = sieve(1000)  # precompute primes up to 1000
    attempts = 0
    while True:
        # Step 1: random odd number in range [2^(bits-1), 2^bits - 1]
        n = random.getrandbits(bits)
        n |= (1 << (bits - 1)) | 1  # set MSB and LSB
        attempts += 1

        # Step 2: trial division by small primes
        if any(n % p == 0 for p in small_primes if p < n):
            continue

        # Step 3: Miller-Rabin (40 rounds → error prob < 2^(-80))
        if miller_rabin(n, k=40):
            return n, attempts

# Example: find a 64-bit prime
p, tries = generate_prime(64)
print(f"Found {bits}-bit prime after {tries} attempts: {p}")
# Typically finds one in ~22 attempts (ln(2^64)/2 ≈ 22)
Concept check: In RSA, why is e = 65537 a popular choice for the public exponent?

Chapter 4: Primality Testing

RSA begins with "pick two large primes." But how do you find a 512-bit prime? You cannot check a list — there are about 2512/ln(2512) ≈ 10151 primes that large. Instead, you generate a random odd number and test whether it is prime. By the prime number theorem, roughly 1 in every 355 512-bit odd numbers is prime, so you will find one quickly.

But how do you test primality efficiently?

Trial Division: The Brute Force

The simplest test: check if any integer from 2 to √n divides n. If none do, n is prime. Why only up to √n? Because if n = a × b and a ≤ b, then a ≤ √n. So if there is a factor, we will find the smaller one before √n.

This is O(√n) — for a 512-bit number, that is 2256 ≈ 1077 divisions. A computer doing 109 divisions per second would take 1068 seconds — roughly 1060 times the age of the universe. Far too slow. We need something dramatically better.

A small optimization: only check odd numbers (after checking 2) and stop at √n. Better yet, only check primes up to √n (using a precomputed sieve). This reduces constant factors but not the fundamental O(√n) scaling. For practical purposes, trial division is fine for numbers up to about 1012 (roughly 40 bits) — √(1012) = 106, which takes microseconds. Beyond that, we need probabilistic methods.

Fermat Test: A Probabilistic Approach

Fermat's little theorem says: if p is prime and 1 ≤ a < p, then ap-1 ≡ 1 (mod p). The contrapositive: if ap-1 ≠ 1 (mod p), then p is definitely composite.

So to test if n is prime: pick a random a, compute an-1 mod n (using repeated squaring — fast!). If the result is not 1, n is composite. If it is 1, n is probably prime.

The problem: there exist Carmichael numbers (like 561 = 3 × 11 × 17) that pass the Fermat test for every base a coprime to n. These "pseudoprimes" fool the Fermat test completely. How common are they? There are infinitely many Carmichael numbers (proved in 1994 by Alford, Granville, and Pomerance). The first few are 561, 1105, 1729, 2465, 2821. They are rare relative to primes — up to 1015 there are about 105,212 Carmichael numbers versus ~29.8 trillion primes — but their existence means the Fermat test alone is insufficient for cryptographic use.

Why does 561 fool the Fermat test? Because 561 = 3 × 11 × 17, and by the Chinese Remainder Theorem, a560 ≡ 1 (mod 561) for all a coprime to 561. This happens because 560 is divisible by 2 (= 3-1), 10 (= 11-1), and 16 (= 17-1). The key property: n is Carmichael if and only if n is square-free and for every prime p dividing n, (p-1) divides (n-1). This is Korselt's criterion.

Miller-Rabin: The Industry Standard

The Miller-Rabin test strengthens the Fermat test by exploiting an additional property of primes. Write n - 1 = 2s · d where d is odd (factor out all powers of 2). If n is prime and gcd(a, n) = 1, then either:

ad ≡ 1 (mod n)
or a2r·d ≡ -1 (mod n) for some 0 ≤ r < s

If neither condition holds, n is definitely composite, and a is called a witness to n's compositeness. If a condition does hold, a is called a strong liar — it fooled us into thinking n might be prime.

Why does this work? The key insight is about non-trivial square roots of 1. If n is prime and x2 ≡ 1 (mod n), then x ≡ ±1 (mod n) — primes have no other square roots of 1. But if n is composite, there can be non-trivial square roots (numbers other than ±1 whose square is 1 mod n). The Miller-Rabin test detects these: if we find x such that x2 ≡ 1 (mod n) but x ≠ ±1 (mod n), we have proof that n is composite.

Let us see this concretely.

Worked example: test n = 561 (the smallest Carmichael number, 561 = 3 × 11 × 17). We write n - 1 = 560 = 24 × 35 (so s = 4, d = 35). Pick witness a = 2:

235 mod 561 = 263 // not 1, not -1 (560). Keep squaring. 2632 mod 561 = 166 // not -1. Keep squaring. 1662 mod 561 = 67 // not -1. Keep squaring. 672 mod 561 = 1 // got 1 but previous was 67 ≠ ±1 // 67 is a non-trivial sqrt of 1! COMPOSITE — witness a=2 proves 561 is not prime.

The Fermat test would compute 2560 mod 561 = 1 and incorrectly say "probably prime." Miller-Rabin looks at the path to that 1 and catches the inconsistency. Specifically, it found that 672 ≡ 1 (mod 561) but 67 ≠ ±1 (mod 561). In a prime number, the only square roots of 1 are 1 and -1. The existence of other square roots (like 67) is proof of compositeness.

This is exactly the property that Carmichael numbers cannot hide. Even though an-1 ≡ 1 (mod n) for all a coprime to n (the Fermat condition), the squaring sequence from ad to an-1 will reveal non-trivial square roots of 1 for at least 3/4 of all bases a. The Miller-Rabin test exploits this structural weakness.

Key guarantee. For any odd composite n, at least 3/4 of the possible bases a in [2, n-2] are witnesses. So each random test has at most a 1/4 chance of being fooled. After k independent tests, the probability of error is at most (1/4)k. With k = 40 rounds, the error probability is about 10-24 — less likely than a cosmic ray flipping a bit in your CPU.

The simulation below steps through the Miller-Rabin test. Enter a number, pick the number of rounds, and watch the algorithm check witnesses.

Miller-Rabin Primality Test — Step by Step

Enter a number and click "Test". Watch the algorithm pick random witnesses and check each condition.

Rounds: 5
python
import random

def miller_rabin(n, k=40):
    """Test if n is probably prime with k rounds.
    Returns False if definitely composite, True if probably prime.
    Error probability ≤ (1/4)^k."""
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0: return False

    # Write n-1 = 2^s * d with d odd
    s, d = 0, n - 1
    while d % 2 == 0:
        s += 1
        d //= 2

    for _ in range(k):
        a = random.randrange(2, n - 1)
        x = mod_exp(a, d, n)        # a^d mod n

        if x == 1 or x == n - 1:
            continue               # passed this round

        for _ in range(s - 1):
            x = mod_exp(x, 2, n)   # square repeatedly
            if x == n - 1:
                break              # found -1, passed
        else:
            return False          # witness found: COMPOSITE

    return True                   # probably prime

# Test: 561 is a Carmichael number (fools Fermat test, NOT Miller-Rabin)
print(miller_rabin(561))   # False — correctly detected as composite!
print(miller_rabin(104729)) # True — this is actually prime
TestTypeComplexityNotes
Trial divisionDeterministicO(√n)Simple but too slow for large n
Fermat testProbabilisticO(k log2 n)Fooled by Carmichael numbers
Miller-RabinProbabilisticO(k log2 n)Industry standard. Error ≤ (1/4)k
AKSDeterministicO(log6 n)Polynomial! But impractical constants

The AKS Breakthrough

In 2002, Agrawal, Kayal, and Saxena (three Indian computer scientists, two of them undergraduates) proved that primality testing is in P — there exists a deterministic polynomial-time algorithm. Their paper "PRIMES is in P" resolved a question that had been open for decades.

The AKS algorithm is based on a generalization of Fermat's little theorem to polynomials: n is prime if and only if (x + a)n ≡ xn + a (mod n) for all a coprime to n. Checking this directly requires expanding a polynomial of degree n, which is exponential. The AKS insight is that you only need to check this identity modulo xr - 1 for a carefully chosen r = O(log2 n), and for O(log n) values of a.

The original algorithm ran in O(log12 n) time. Subsequent improvements brought it to O(log6 n). Despite being polynomial, the constant factors are enormous — for a 1024-bit number, AKS is millions of times slower than Miller-Rabin. So in practice, everyone uses Miller-Rabin with enough rounds. AKS was a theoretical triumph (proving PRIMES ∈ P), not a practical improvement.

Deterministic Miller-Rabin. Under the Generalized Riemann Hypothesis (GRH), it suffices to test bases a = 2, 3, ..., 2(ln n)2. This makes Miller-Rabin deterministic in O(log4 n) time — if GRH is true. For n < 3.3 × 1024, testing just the 12 bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} gives a deterministic result. This is the approach used in competitive programming (see Chapter 8).
Concept check: Why does Miller-Rabin succeed on Carmichael number 561 while the Fermat test fails?

Chapter 5: Cryptography Showcase

This is the payoff. Everything from Chapters 1-4 comes together in two interactive demonstrations: a full RSA message encryption pipeline, and a Diffie-Hellman key exchange where Alice and Bob agree on a secret over an insecure channel.

Full RSA Demo

In the real world, RSA does not encrypt just numbers — it encrypts messages. Each character is converted to a number (its ASCII code), encrypted separately, and the ciphertext is a sequence of numbers. (Real RSA uses OAEP padding and encrypts blocks, not individual characters — but character-by-character encryption shows the math clearly.)

The simulation below lets you type a short message, generate keys, and see every character encrypted and decrypted with full math shown. Try different bit sizes: 8-bit keys use tiny primes (fast but insecure — you could factor n by hand). 16-bit keys use primes up to ~256, which are trivial for a computer but illustrate the full RSA flow.

Watch the numbers. Notice how the ciphertext numbers bear no visible relationship to the plaintext ASCII codes. The letter 'H' (code 72) might encrypt to 1847 — completely scrambled. Yet decryption perfectly recovers 72 every time. This is the magic of modular exponentiation: it is a permutation of {0, 1, ..., n-1}, and the private key tells you the inverse permutation.
RSA Message Pipeline — Character by Character

Type a message. Pick a key size. Click "Encrypt" to see each character become a number, get raised to the e-th power mod n, then "Decrypt" to reverse it.

Diffie-Hellman Key Exchange

RSA lets Alice send an encrypted message to Bob. But what if Alice and Bob have never met and need to agree on a shared secret key for symmetric encryption (like AES)? They cannot exchange a key in the open — anyone could eavesdrop. Enter Diffie-Hellman (1976).

The setup: Alice and Bob publicly agree on a large prime p and a generator g. Then:

Alice picks secret a
Computes A = ga mod p and sends A to Bob
Bob picks secret b
Computes B = gb mod p and sends B to Alice
Shared secret
Alice: s = Ba mod p. Bob: s = Ab mod p. Both equal gab mod p!

An eavesdropper sees p, g, A, and B but not a or b. To find the shared secret, they would need to compute a from A = ga mod p — the discrete logarithm problem, which is believed to be computationally infeasible for large p.

Diffie-Hellman Key Exchange — Alice & Bob

Watch Alice and Bob agree on a shared secret while Eve eavesdrops on the public channel. The green values are secret; the orange values are public.

The magic. Alice and Bob have never met. They communicated only over a public channel that Eve can read. Yet they now share a secret that Eve cannot compute. This was considered impossible before 1976. Diffie-Hellman changed the world.

The Paint Mixing Analogy

Here is the most intuitive way to understand Diffie-Hellman. Imagine Alice and Bob each have a can of paint. They publicly agree on a common base color (say, yellow). Alice mixes her secret color (red) into the yellow to get orange, and sends the orange paint to Bob. Bob mixes his secret color (blue) into the yellow to get green, and sends the green paint to Alice. Now Alice adds her secret red to Bob's green, and Bob adds his secret blue to Alice's orange. Both end up with the same final color — but Eve, who saw the yellow, orange, and green paints, cannot unmix them to recover the secrets.

In the math version: "mixing" is modular exponentiation (easy to compute), and "unmixing" is the discrete logarithm (hard to compute). The fact that Ab = Ba = gab (mod p) is the commutativity that makes the protocol work.

Security Considerations

Diffie-Hellman is vulnerable to a man-in-the-middle attack: if Eve can intercept and modify messages, she can run separate DH exchanges with Alice and Bob, giving each a different shared secret. Neither Alice nor Bob would know they are talking to Eve instead of each other. This is why DH is always combined with authentication (digital signatures, certificates) in real protocols like TLS.

The security of DH depends on the Decisional Diffie-Hellman (DDH) assumption: given g, ga, gb, it is computationally infeasible to distinguish gab from a random element. This is believed to hold for carefully chosen prime groups and elliptic curve groups, but not for all groups.

Real-World Parameters

In practice, Diffie-Hellman uses carefully chosen parameters for security:

ParameterSize (classic DH)Size (ECDH)
Prime p / Curve order2048-4096 bits256-521 bits
Generator g / Base point GStandardized (RFC 3526)Standardized (P-256, X25519)
Secret exponent a, b256+ bitsSame as curve order
Security level~112 bits (2048-bit DH)~128 bits (P-256)

The most popular DH variant today is X25519 (Curve25519), designed by Daniel Bernstein. It uses an elliptic curve over a prime field, is faster than traditional DH by ~10x, resistant to timing attacks by design, and is the default key exchange in TLS 1.3, SSH, Signal, and WireGuard. When you visit any HTTPS website in 2025, X25519 is almost certainly the key exchange running under the hood.

Concept check: In the Diffie-Hellman key exchange, Eve sees the public values A = ga mod p and B = gb mod p. Why can't she compute the shared secret s = gab mod p?

The Full TLS Handshake (Simplified)

Here is how Chapters 1-5 come together in a real TLS connection:

Client Hello
Browser sends supported ciphers, random nonce
Server Hello
Server sends certificate (containing RSA/ECDSA public key), chosen cipher
Key Exchange
ECDH (X25519): both sides exchange public values, derive shared secret via DH
Authentication
Server signs handshake transcript with RSA/ECDSA private key (from certificate)
Symmetric Encryption
Shared secret → derive AES-256-GCM keys → all further data encrypted symmetrically

Every step uses number theory: ECDH for key exchange (modular arithmetic on elliptic curves), RSA/ECDSA for authentication (modular exponentiation, modular inverse), key derivation via HKDF (built on HMAC, which uses modular addition internally). The symmetric encryption (AES) is the only part that does not directly use number theory — it uses substitution-permutation networks instead.

Chapter 6: Chinese Remainder Theorem

A puzzle from 3rd-century China (Sun Tzu's Sunzi Suanjing, circa 3rd century AD): "There are certain things whose number is unknown. Repeatedly divided by 3, the remainder is 2; by 5, the remainder is 3; by 7, the remainder is 2. What will be the number?"

This ancient puzzle encodes a deep mathematical principle. The Chinese Remainder Theorem tells us that if you know the remainders of a number when divided by several pairwise coprime moduli, you can uniquely reconstruct the number (up to a multiple of the product of the moduli). This is like identifying a point in a multi-dimensional space from its projections onto each axis.

In modern notation: find x such that x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7). The answer is x = 23 (and x = 23 + 105k for any integer k, since 105 = 3 × 5 × 7).

The Chinese Remainder Theorem (CRT) says: if the moduli m1, m2, ..., mk are pairwise coprime (every pair has gcd = 1), then the system of congruences has a unique solution modulo M = m1 × m2 × ... × mk.

The Construction

For each i, let Mi = M / mi (the product of all moduli except the i-th). Since gcd(Mi, mi) = 1 (because the moduli are pairwise coprime), we can compute yi = Mi-1 mod mi using extended Euclid. Then:

x = ∑i=1k ai · Mi · yi (mod M)

Worked example: x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7).

M = 3 × 5 × 7 = 105 M1 = 105/3 = 35, y1 = 35-1 mod 3 = 2 (since 35 × 2 = 70 ≡ 1 mod 3) M2 = 105/5 = 21, y2 = 21-1 mod 5 = 1 (since 21 × 1 = 21 ≡ 1 mod 5) M3 = 105/7 = 15, y3 = 15-1 mod 7 = 1 (since 15 × 1 = 15 ≡ 1 mod 7) x = 2 × 35 × 2 + 3 × 21 × 1 + 2 × 15 × 1 = 140 + 63 + 30 = 233 = 233 mod 105 = 23 Check: 23 mod 3 = 2 ✓, 23 mod 5 = 3 ✓, 23 mod 7 = 2 ✓
CRT for RSA speedup. To compute m = cd mod n where n = pq, you can compute mp = cd mod (p-1) mod p and mq = cd mod (q-1) mod q separately, then combine via CRT. This is about 4x faster than computing cd mod n directly, because exponentiating mod a number half the bit-length is 4x cheaper (quadratic in bit-length). Every practical RSA implementation uses this optimization.

Why CRT Works: Intuition

Think of CRT as a "coordinate system" for numbers. Just as a point in 2D space is uniquely identified by its (x, y) coordinates, a number in [0, M) is uniquely identified by its residues modulo the coprime factors of M. The number 23, when "projected" onto the axes mod 3, mod 5, and mod 7, gives coordinates (2, 3, 2). CRT tells us how to go back from coordinates to the original number.

This is not just an analogy — it is a precise algebraic isomorphism. The map x ↦ (x mod m1, x mod m2, ..., x mod mk) is a ring isomorphism from ZM to Zm1 × Zm2 × ... × Zmk. This means addition and multiplication are preserved: you can add, subtract, and multiply in the "coordinate" representation and convert back at the end.

CRT in Practice: Garner's Algorithm

The formula we showed above involves large intermediate products (ai · Mi · yi can overflow). In practice, Garner's algorithm computes the result incrementally without large intermediates. It builds the answer one modulus at a time using a mixed-radix representation. This is the algorithm OpenSSL uses for CRT-based RSA decryption.

CRT Solver — System of Congruences

Enter up to 4 congruences x ≡ ai (mod mi). The solver shows the CRT construction step by step.

x ≡ (mod ), ≡ (mod ), ≡ (mod )
python
def chinese_remainder(remainders, moduli):
    """Solve x ≡ a_i (mod m_i) for pairwise coprime moduli.
    Returns x mod M where M = product of all moduli."""
    M = 1
    for m in moduli:
        M *= m

    x = 0
    for a_i, m_i in zip(remainders, moduli):
        M_i = M // m_i
        _, y_i, _ = extended_gcd(M_i, m_i)  # M_i * y_i ≡ 1 (mod m_i)
        x += a_i * M_i * y_i

    return x % M

# Sun Tzu's puzzle
print(chinese_remainder([2, 3, 2], [3, 5, 7]))  # 23

CRT Worked Example: RSA Decryption Speedup

Let us see the RSA-CRT optimization concretely. Suppose n = 3233 = 61 × 53, d = 2753, and we want to decrypt c = 2790.

Standard RSA: m = 27902753 mod 3233 // exponent is 2753, modulus is 3233 RSA-CRT: dp = d mod (p-1) = 2753 mod 60 = 53 dq = d mod (q-1) = 2753 mod 52 = 49 mp = 279053 mod 61 = 65 mod 61 = 4 // exponent 53, modulus 61 (much smaller!) mq = 279049 mod 53 = 65 mod 53 = 12 // exponent 49, modulus 53 // Combine via CRT: find m such that m ≡ 4 (mod 61) and m ≡ 12 (mod 53) q_inv = 53-1 mod 61 = 38 // since 53 × 38 = 2014 ≡ 1 (mod 61) h = q_inv × (mp - mq) mod p = 38 × (4 - 12) mod 61 = 38 × (-8) mod 61 = 38 × 53 mod 61 = 2014 mod 61 = 1 m = mq + h × q = 12 + 1 × 53 = 65

Notice: instead of raising to power 2753 mod 3233, we raised to power 53 mod 61 and power 49 mod 53. Both the exponents and moduli are roughly half the size. Since modular exponentiation cost is O(log e × (log n)2), halving both gives approximately a 4x speedup. For 2048-bit RSA, this optimization is essential — without it, decryption would be noticeably slow even on modern hardware.

Concept check: Why does CRT speed up RSA decryption by about 4x?

Chapter 7: Applications

Number theory is not an abstract curiosity. It is running right now in your browser, your phone, your bank, and the infrastructure that keeps civilization connected. Here is where the algorithms from this lesson appear in practice.

Cryptography

RSA is still widely used for key exchange and digital signatures. When your browser shows a padlock icon, TLS uses RSA (or elliptic curve variants) to establish a secure session. Digital certificates that prove a website's identity are signed with RSA.

Diffie-Hellman and its elliptic curve variant ECDH handle key agreement in almost every secure connection. When you open WhatsApp, Signal, or any end-to-end encrypted messenger, Diffie-Hellman runs in the background.

Elliptic Curve Cryptography (ECC) achieves the same security as RSA with much smaller keys. A 256-bit ECC key offers comparable security to a 3072-bit RSA key. This matters for mobile devices and IoT where bandwidth and computation are limited. ECC uses the same modular arithmetic primitives but over a different algebraic structure (points on an elliptic curve rather than integers mod n).

Hashing

Hash functions (Chapter 11) use modular arithmetic extensively. The division method h(k) = k mod m maps keys to table slots. Why should m be prime? Because if m has many small factors, keys with patterns (like multiples of common values) will cluster in certain slots. A prime m distributes keys more uniformly because there are no common factors to create patterns.

Universal hashing uses h(k) = ((ak + b) mod p) mod m where p is a prime larger than the universe of keys. This guarantees that for any two distinct keys, the probability of collision is at most 1/m — matching the ideal random function. The proof relies on the fact that Zp is a field (every non-zero element has an inverse), which ensures the linear map ak + b is a bijection for any a ≠ 0.

Cryptographic hash functions like SHA-256 use modular addition of 32-bit words (mod 232) as a core mixing operation, combined with bitwise rotations and XOR. The non-linearity of modular addition (carries propagate unpredictably) is essential for the avalanche effect: changing one input bit should flip about half the output bits.

Random Number Generation

Linear congruential generators (LCGs) produce pseudorandom sequences using the recurrence Xn+1 = (aXn + c) mod m. The constants a, c, m must be chosen carefully (using number-theoretic criteria) to achieve a full period of m. For full period: gcd(c, m) = 1, every prime dividing m must also divide (a-1), and if 4 divides m then 4 must divide (a-1). These conditions are pure number theory.

The Blum-Blum-Shub generator uses xn+1 = xn2 mod n (where n = pq for large Blum primes p ≡ q ≡ 3 mod 4) for cryptographically secure randomness. Its security is provably equivalent to the difficulty of factoring n.

Digital Signatures

RSA is not just for encryption — it works in reverse for digital signatures. To sign a message m, the signer computes s = md mod n (using their private key). Anyone can verify: m = se mod n (using the public key). If the verification succeeds, the message must have been signed by someone who knows d. This provides authentication (the message came from the claimed sender), integrity (the message was not altered), and non-repudiation (the signer cannot deny signing).

In practice, you sign the hash of the message, not the message itself (messages can be arbitrarily large, but hashes are fixed-size). The signature is s = hash(m)d mod n. The verifier checks hash(m) = se mod n.

Error-Correcting Codes

Reed-Solomon codes — used in QR codes, CDs, DVDs, and deep-space communication — are built on finite field arithmetic (modular arithmetic over polynomials). When your phone scans a partially damaged QR code and still reads it, that is number theory at work. A QR code can lose up to 30% of its data and still be decoded correctly, thanks to redundancy encoded using polynomial evaluation over finite fields (Galois fields GF(28)).

The Voyager 1 and 2 space probes, now over 15 billion miles from Earth, transmit data at about 160 bits per second. The signal is so weak that individual bits are frequently corrupted. Reed-Solomon and convolutional codes (both built on modular arithmetic) enable scientists to reconstruct the original data with near-perfect accuracy from this noisy stream.

Blockchain and Proof of Work

Bitcoin's proof-of-work requires finding a nonce such that SHA-256(block || nonce) starts with a certain number of zero bits. This is essentially a modular arithmetic problem: find x such that f(x) mod 2k = 0. The difficulty is adjusted by changing k. As of 2025, Bitcoin miners perform roughly 600 × 1018 (600 exahashes) SHA-256 computations per second worldwide. Each SHA-256 internally uses modular addition of 32-bit integers as its core mixing operation.

Digital signatures on Bitcoin transactions use ECDSA (Elliptic Curve Digital Signature Algorithm) over the secp256k1 curve. Each signature requires computing a modular inverse mod the curve order — the extended Euclidean algorithm from Chapter 1, running inside every Bitcoin wallet in the world.

Ethereum recently transitioned to proof-of-stake, but its smart contracts still use extensive modular arithmetic. The BN254 pairing-friendly elliptic curve (used in zk-SNARKs for privacy-preserving transactions) performs arithmetic modulo a 254-bit prime. Zero-knowledge proofs — a hot topic in blockchain and privacy — are built entirely on number-theoretic foundations.

ApplicationNumber Theory UsedAlgorithm
TLS/HTTPSRSA or ECDH key exchangeModular exponentiation, Euclid
Digital signaturesRSA-PSS, ECDSAModular inverse, fast exp
Hash tablesDivision method, universal hashingModular arithmetic
Pseudorandom generatorsLCG, Blum-Blum-ShubModular multiplication, squaring
Error correctionReed-Solomon, BCH codesFinite field arithmetic
BlockchainProof of work, ECDSAHashing (modular), discrete log
FFT (Chapter 30)Roots of unityModular arithmetic over complex/finite fields
Post-quantum cryptography. Quantum computers can factor integers in polynomial time (Shor's algorithm), which would break RSA and Diffie-Hellman. The cryptography community is migrating to lattice-based and code-based schemes that resist quantum attacks. NIST standardized the first post-quantum algorithms (CRYSTALS-Kyber and CRYSTALS-Dilithium) in 2024. The number theory from this lesson remains relevant — understanding why RSA falls to quantum attacks requires understanding the number theory it is built on.

Timeline of Number Theory in Cryptography

YearEventImpact
~300 BCEuclid's algorithmStill used in every RSA implementation
1640Fermat's little theoremFoundation of primality testing
1763Euler's theorem (generalized Fermat)Why RSA decryption works
1801Gauss, Disquisitiones ArithmeticaeFormalized modular arithmetic
1976Diffie-Hellman key exchangeSolved the key distribution problem
1977RSA cryptosystemFirst practical public-key encryption
1980Miller-Rabin testEfficient probabilistic primality testing
1985Elliptic curve cryptography (Koblitz, Miller)Smaller keys, same security
1994Shor's algorithmQuantum threat to RSA/DH
2002AKS primality testPRIMES is in P (theoretical milestone)
2024NIST post-quantum standardsMigration from RSA/DH to lattice-based crypto
Concept check: Why does elliptic curve cryptography achieve the same security as RSA with much smaller keys?

Chapter 8: Interview Arsenal

Number theory problems appear regularly in coding interviews, competitive programming, and system design discussions. Here is your cheat sheet plus drills.

The Cheat Sheet

ToolWhen to UseComplexity
Euclid's GCDgcd(a,b), lcm(a,b) = ab/gcd(a,b), coprimality checkO(log min(a,b))
Extended EuclidModular inverse, Bézout coefficients, linear Diophantine equationsO(log min(a,b))
Fast mod expab mod m, RSA, Fermat test, any large exponentO(log b) mults
Miller-RabinPrimality test for large numbersO(k log2 n)
CRTCombine congruences with coprime moduliO(k log M)
Sieve of EratosthenesAll primes up to NO(N log log N)

Common Interview Patterns

Number theory problems in interviews fall into a few patterns. Recognizing the pattern is half the battle:

Pattern recognition. If the problem says "mod 109+7" → you need modular exponentiation. If it asks for GCD/LCM → Euclid. If it involves counting coprime pairs → Euler's totient. If it asks "is prime?" → Miller-Rabin or sieve. If it involves linear equations in integers → extended Euclid / Diophantine equations. If it says "modular inverse" → extended Euclid or Fermat (if mod is prime).

Drill 1: Modular Exponentiation

Problem: Compute ab mod m for very large b (up to 1018). The key insight: you cannot compute ab first then take mod — the intermediate result would have billions of digits. You must reduce modulo m at every step.

python
def mod_exp(base, exp, mod):
    """O(log exp) — the workhorse of number-theoretic algorithms."""
    result = 1
    base %= mod
    while exp > 0:
        if exp & 1:             # odd exponent: multiply in current base
            result = result * base % mod
        exp >>= 1              # divide exponent by 2
        base = base * base % mod  # square the base
    return result

# Python built-in: pow(base, exp, mod) does the same thing
assert mod_exp(7, 13, 11) == pow(7, 13, 11)

Drill 2: Extended Euclidean / Modular Inverse

Problem: Find x such that ax ≡ 1 (mod m). This has a solution if and only if gcd(a, m) = 1.

python
def mod_inverse(a, m):
    """Find x such that a*x ≡ 1 (mod m). Raises if no inverse."""
    g, x, _ = extended_gcd(a, m)
    if g != 1:
        raise ValueError(f"No inverse: gcd({a},{m})={g}")
    return x % m

# Alt for prime m: Fermat's little theorem gives a^(-1) = a^(m-2) mod m
def mod_inverse_fermat(a, p):
    """Only works when p is prime."""
    return pow(a, p - 2, p)

Drill 3: Sieve of Eratosthenes

Problem: Find all primes up to N. The sieve marks composites by iterating through each prime and crossing off its multiples. This is one of the oldest algorithms — attributed to Eratosthenes of Cyrene (circa 200 BC). The sieve is also a fundamental building block: many number-theoretic algorithms precompute a list of small primes as a first step.

The key optimization: start crossing off at p2, not 2p. Why? All multiples of p smaller than p2 have already been marked by smaller primes. For example, when processing prime 5, we start at 25 (not 10, 15, 20 — those were already crossed off by 2 and 3).

For very large N (billions), the standard sieve does not fit in memory. The segmented sieve processes the range in blocks of size √N, using the small primes (up to √N) to mark each block. This reduces memory from O(N) to O(√N).

The simulation below visualizes the sieve. Watch as each prime "crosses off" its multiples.

Sieve of Eratosthenes — Animated

Set N and click "Run Sieve" to watch composites get crossed off. Orange = current prime being processed. Red = crossed off composite. Green = confirmed prime.

N 60
python
def sieve(n):
    """Return all primes up to n. O(n log log n)."""
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    for i in range(2, int(n**0.5) + 1):
        if is_prime[i]:
            for j in range(i*i, n+1, i):
                is_prime[j] = False
    return [i for i in range(2, n+1) if is_prime[i]]

print(sieve(50))  # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Drill 4: Sieve of Eratosthenes (with Smallest Prime Factor)

An enhanced sieve stores the smallest prime factor (SPF) for each number. This enables O(log n) factorization of any number up to N by repeatedly dividing by the SPF.

python
def sieve_spf(n):
    """Sieve that stores smallest prime factor for each number.
    Enables O(log n) factorization."""
    spf = list(range(n + 1))  # spf[i] = i initially
    for i in range(2, int(n**0.5) + 1):
        if spf[i] == i:  # i is prime
            for j in range(i*i, n+1, i):
                if spf[j] == j:
                    spf[j] = i
    return spf

def factorize(n, spf):
    """Factorize n using the SPF sieve. O(log n)."""
    factors = []
    while n > 1:
        factors.append(spf[n])
        n //= spf[n]
    return factors

spf = sieve_spf(100)
print(factorize(84, spf))  # [2, 2, 3, 7]
print(factorize(97, spf))  # [97] — prime!

Drill 5: GCD-Based Problems

python
# LCM: least common multiple
def lcm(a, b):
    return a * b // gcd(a, b)

# Check coprimality
def coprime(a, b):
    return gcd(a, b) == 1

# Euler's totient: count of integers in [1,n] coprime to n
def euler_totient(n):
    """O(√n) — factorize n, then use the product formula."""
    result = n
    p = 2
    while p * p <= n:
        if n % p == 0:
            while n % p == 0:
                n //= p
            result -= result // p
        p += 1
    if n > 1:
        result -= result // n
    return result

print(euler_totient(3233))  # 3120 = (61-1)*(53-1)

Drill 6: Linear Diophantine Equations

Problem: Find all integer solutions to ax + by = c. By Bézout's identity, solutions exist if and only if gcd(a,b) divides c. If (x0, y0) is one solution, then all solutions are x = x0 + (b/g)t, y = y0 - (a/g)t for integer t, where g = gcd(a,b).

python
def solve_diophantine(a, b, c):
    """Solve ax + by = c. Returns (x0, y0, dx, dy) where
    all solutions are (x0 + dx*t, y0 + dy*t) for integer t.
    Returns None if no solution."""
    g, x, y = extended_gcd(abs(a), abs(b))
    if c % g != 0:
        return None  # no solution
    scale = c // g
    x0 = x * scale * (1 if a >= 0 else -1)
    y0 = y * scale * (1 if b >= 0 else -1)
    dx = b // g   # step in x direction
    dy = -a // g  # step in y direction
    return x0, y0, dx, dy

# Example: 3x + 5y = 4
# gcd(3,5)=1 divides 4, so solutions exist
x0, y0, dx, dy = solve_diophantine(3, 5, 4)
print(f"x={x0}+{dx}t, y={y0}+{dy}t")
# Verify: 3*x0 + 5*y0 should equal 4

Drill 7: Miller-Rabin in Contest Form

python
def is_prime(n):
    """Deterministic for n < 3.3 × 10^24 (using specific witnesses)."""
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0: return False
    s, d = 0, n - 1
    while d % 2 == 0: s += 1; d //= 2
    # These witnesses suffice for n < 3.3 × 10^24
    for a in [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]:
        if a >= n: continue
        x = pow(a, d, n)
        if x == 1 or x == n - 1: continue
        for _ in range(s - 1):
            x = pow(x, 2, n)
            if x == n - 1: break
        else: return False
    return True

Common Mistakes in Interviews

MistakeWhy It's WrongCorrect Approach
Computing ab then taking modab can be astronomically large — overflows any integer typeTake mod at every step (repeated squaring)
Assuming modular inverse always existsa-1 mod n exists only when gcd(a,n)=1Check gcd first; raise error if not coprime
Using Fermat for non-prime modulusam-1 ≡ 1 (mod m) only when m is primeUse extended Euclid for general modular inverse
Forgetting negative results from extended GCDx from extended_gcd can be negativeReturn (x % m + m) % m to ensure positive result
Integer overflow in a*b % mEven if a,b < m, the product a*b can overflow 64-bit intUse 128-bit multiplication or Python (arbitrary precision)
Concept check: You need to compute 31018 mod 109+7 in a coding contest. What approach works?

Chapter 9: Connections

Number theory is a thread that runs through many seemingly unrelated parts of algorithms and computer science. Here is how this chapter connects to the rest of CLRS and beyond.

Where We Came From

ChapterConnection
Ch 11: Hash TablesThe division method h(k) = k mod m is pure modular arithmetic. Universal hashing uses h(k) = ((ak+b) mod p) mod m where p is prime. The analysis of collision probability uses number-theoretic arguments about the distribution of residues.
Ch 30: FFTThe Fast Fourier Transform evaluates polynomials at the n-th roots of unity: e2πik/n. The Number Theoretic Transform (NTT) replaces complex roots with primitive roots modulo a prime, enabling exact polynomial multiplication over integers.
Ch 34: NP-CompletenessInteger factoring is believed (but not proven) to be neither in P nor NP-complete — it sits in a suspected "middle ground." If factoring were in P, RSA would be broken. If factoring were NP-complete, then P = NP would break RSA (and much more).

What Builds on This

Elliptic curve cryptography generalizes modular arithmetic from integers to points on curves. All the same primitives — GCD, modular inverse, fast exponentiation — apply, but over a different algebraic structure.

Lattice-based cryptography (the post-quantum frontier) works with vectors in high-dimensional lattices. The hardness assumption is different (shortest vector problem instead of factoring), but the algorithmic tools overlap: modular arithmetic, polynomial multiplication (via NTT), and probabilistic arguments.

Coding theory uses finite fields (modular arithmetic over polynomials) for error correction. Reed-Solomon codes, BCH codes, and LDPC codes all build on the algebra from this chapter.

Computational geometry surprisingly uses number theory for robust predicates. When determining if a point is left or right of a line, floating-point arithmetic can give wrong answers. Exact integer arithmetic with modular bounds avoids this — and the key insight is that determinant computations can be done in Zp for a sufficiently large prime p.

String algorithms use modular hashing extensively. The Rabin-Karp string matching algorithm computes h(s) = ∑ s[i] × bi mod p as a rolling hash, enabling O(n) expected-time pattern matching. The collision analysis directly invokes the number-theoretic properties of primes.

Randomized algorithms from Chapter 5 of CLRS use number theory for hash function construction. Universal hash families, the probabilistic method, and derandomization techniques all lean on properties of primes and modular arithmetic.

Further Reading

For readers who want to go deeper into computational number theory:

ResourceLevelFocus
CLRS Chapter 31UndergraduateAlgorithms perspective — GCD, modular exp, RSA, Miller-Rabin, CRT
Shoup, A Computational Introduction to Number Theory and AlgebraGraduateRigorous treatment with complexity analysis. Free online.
Katz & Lindell, Introduction to Modern CryptographyGraduateSecurity proofs, formal definitions, modern constructions
Crandall & Pomerance, Prime NumbersAdvancedState-of-the-art algorithms for primality and factoring
Project Euler (projecteuler.net)PracticeHundreds of problems requiring number-theoretic algorithms

The connection between number theory and cryptography is one of the most beautiful stories in mathematics: a field studied for 2300 years as pure intellectual curiosity turned out to be the foundation of modern digital security. Euclid could not have imagined that his algorithm for finding the GCD of two numbers would one day protect billions of financial transactions per day. Yet here we are — and every concept in this lesson is running right now, in your browser, keeping your data safe.

The Hierarchy of Hardness

Easy (P)
GCD, modular exp, primality testing (AKS is polynomial)
Hard (believed not in P)
Factoring, discrete log — the basis of RSA and Diffie-Hellman
NP-Complete
3SAT, TSP, graph coloring — factoring is probably NOT here
Undecidable
Halting problem — no algorithm at all
The big picture. Cryptography lives in the gap between P and NP-complete. It requires problems that are hard enough that attackers cannot solve them, but structured enough that legitimate users can work with them efficiently. Number theory provides exactly this: multiplication is easy, factoring is hard. Exponentiation is easy, discrete log is hard. This gap is the entire foundation of digital security.

What We Covered

AlgorithmInputOutputComplexityUsed In
Euclid's GCDa, bgcd(a,b)O(log min(a,b))RSA keygen, coprimality
Extended Euclida, bgcd, x, yO(log min(a,b))Modular inverse, RSA keygen
Modular exponentiationa, b, nab mod nO(log b)RSA encrypt/decrypt, DH, primality
Miller-Rabinn, kprime/compositeO(k log2 n)RSA prime generation
CRTcongruencessolution xO(k log M)RSA speedup, parallel computation

The Quantum Threat in Detail

Shor's algorithm (1994) factors an integer n in O(log3 n) time on a quantum computer. The key idea: factoring reduces to finding the period of the function f(x) = ax mod n, which quantum computers can do efficiently using the Quantum Fourier Transform. Once you know the period r of ax mod n, you can (with high probability) compute gcd(ar/2 ± 1, n) to find a factor.

Current quantum computers have about 1,000 noisy qubits. Breaking 2048-bit RSA requires roughly 4,000 logical (error-corrected) qubits, which translates to millions of physical qubits with current error rates. Estimates vary, but most experts expect this capability between 2030-2050. The cryptographic community is not waiting — NIST finalized the first post-quantum standards in 2024:

AlgorithmTypeHardness BasisKey Size
CRYSTALS-Kyber (ML-KEM)Key exchangeModule lattices (MLWE)~1.5 KB
CRYSTALS-Dilithium (ML-DSA)SignatureModule lattices (MLWE)~2.5 KB
SPHINCS+SignatureHash functions~8-49 KB

These algorithms are based on lattice problems (finding short vectors in high-dimensional lattices), not factoring. Shor's algorithm does not help with lattice problems, and no quantum algorithm is known that solves them efficiently. The transition from RSA/DH to lattice-based cryptography is the biggest change in internet security infrastructure since the adoption of TLS.

Open Problems

Some of the deepest unsolved questions in mathematics connect directly to the algorithms in this chapter:

Final check: Which of the following statements is true about the relationship between number theory and computer science?

"God made the integers; all else is the work of man." — Leopold Kronecker

"It is hard to think of any scientific discovery in the history of the world that has had a more immediate and dramatic impact on human society than public key cryptography." — Simon Singh, The Code Book