Number-Theoretic Algorithms (CLRS Ch 31)

Chapter 0: The Problem

You type your credit card number into a website. That number travels across the open internet — through dozens of routers, across undersea cables, past anyone who might be eavesdropping. Yet somehow, only the website can read it. How?

The answer is public-key cryptography, and its most famous incarnation is the RSA cryptosystem (named after Rivest, Shamir, and Adleman, who published it in 1977 — though Clifford Cocks at GCHQ discovered it secretly in 1973). RSA rests on a beautiful asymmetry: multiplying two large prime numbers together is trivially easy (a laptop can multiply two 1000-digit primes in milliseconds), but factoring the product back into those two primes is, as far as anyone knows, computationally infeasible. A 2048-bit RSA key would take all the computers on Earth longer than the age of the universe to crack by brute force.

Before RSA, secure communication required that both parties share a secret key in advance (a symmetric key). This was the chicken-and-egg problem of cryptography: to communicate securely you need a shared key, but to share a key securely you need secure communication. Governments used couriers with briefcases handcuffed to their wrists. Businesses used sealed envelopes. There was no way for two strangers on opposite sides of the internet to establish a shared secret.

Public-key cryptography solved this. Each person generates a key pair: a public key (which they publish to the world) and a private key (which they keep secret). Anyone can encrypt a message using the public key, but only the holder of the private key can decrypt it. This is like mailing someone a padlock — they can lock something in a box and send it back, but only you can open it.

This entire asymmetry — easy to multiply, hard to factor — is built on number theory: the study of integers and their properties. Euclid's algorithm (circa 300 BC) is still used in every RSA implementation today. Fermat's little theorem (1640) underpins every primality test your browser runs. Modular arithmetic — "clock arithmetic" — is the language these algorithms speak.

The core idea. Number theory gives us trapdoor functions: operations that are easy to compute in one direction but practically impossible to reverse without a secret key. Multiplying primes is easy. Factoring is hard. Modular exponentiation is easy. Discrete logarithms are hard. These one-way functions are the foundation of internet security. This lesson builds every piece from scratch.

The simulation below shows the fundamental asymmetry. Click "Multiply" to watch two primes combine instantly. Click "Factor" to watch a brute-force search struggle with even modest numbers.

The Factoring Asymmetry

Multiply two primes: instant. Factor the product: exponentially hard.

Prime size (digits) 2

With 2-digit primes, factoring is quick — maybe a few hundred trial divisions. With 3-digit primes, it takes thousands. With 5-digit primes, it takes millions. Real RSA uses primes with 300+ digits (1024 bits). The number of trial divisions needed is roughly √n, which for a 600-digit number is 10³⁰⁰. For reference, the observable universe contains about 10⁸⁰ atoms.

Let us put this concretely. Here are the best known factoring speeds for different key sizes:

RSA Key Size	Product n (digits)	Best Attack Time	Status
512 bits	~155 digits	~8 months (1999)	Broken
768 bits	~232 digits	~2 years (2009)	Broken
1024 bits	~309 digits	Feasible for nation-states	Deprecated
2048 bits	~617 digits	Beyond current capability	Standard
4096 bits	~1234 digits	Beyond foreseeable capability	High security

The best known classical algorithm for factoring is the General Number Field Sieve (GNFS), which runs in sub-exponential time: approximately exp(1.9 · (ln n)^1/3 · (ln ln n)^2/3). This is much faster than trial division's O(√n) = O(n^1/2), but still far too slow for 2048-bit numbers. The key insight: factoring is hard not because we lack clever algorithms, but because even the cleverest known algorithms scale exponentially in the number of digits.

Here is our roadmap. We start with the greatest common divisor and Euclid's elegant algorithm (Chapter 1). We build up modular arithmetic — the "clock math" that makes RSA work (Chapter 2). We construct the RSA cryptosystem from scratch (Chapter 3). We learn to test whether a number is prime without factoring it (Chapter 4). We combine everything into a full cryptographic showcase (Chapter 5). Then we cover the Chinese Remainder Theorem (Chapter 6), real-world applications (Chapter 7), interview problems (Chapter 8), and connections to the rest of the CLRS universe (Chapter 9).

Concept check: Why can't an eavesdropper who intercepts an RSA-encrypted message simply factor the public key to recover the private key?

The public key is encrypted too Factoring is mathematically impossible Factoring a product of two large primes is computationally infeasible — no known algorithm can do it in reasonable time for 1024+ bit numbers The private key is stored on a separate server

Chapter 1: GCD & Euclid's Algorithm

Before we can build RSA, we need one fundamental tool: computing the greatest common divisor (GCD) of two integers. The GCD of a and b, written gcd(a, b), is the largest integer that divides both a and b. For example, gcd(12, 8) = 4 because 4 is the largest number that divides both 12 and 8.

Why do we need GCD for cryptography? Because RSA key generation requires finding a number e that is coprime to φ(n) — meaning gcd(e, φ(n)) = 1. And computing the private key d requires the extended Euclidean algorithm, which finds x and y such that ax + by = gcd(a, b). This equation is called Bézout's identity, and it is the backbone of modular inverses.

Beyond cryptography, GCD appears everywhere: simplifying fractions (gcd(numerator, denominator)), computing least common multiples (lcm(a,b) = ab/gcd(a,b)), solving linear Diophantine equations (ax + by = c has solutions if and only if gcd(a,b) divides c), and even in the analysis of the Euclidean algorithm for computing continued fractions.

Euclid's algorithm is arguably the oldest non-trivial algorithm still in daily use — over 2300 years old and still optimal. Donald Knuth called it "the granddaddy of all algorithms," not just because of its age, but because it established a pattern: a simple recursive structure that provably converges, with tight analysis of its running time. The Fibonacci worst-case analysis (gcd(F_n, F_n-1) takes n-2 steps) connects number theory to one of the most famous sequences in mathematics.

The Naive Approach

You could find gcd(a, b) by listing all divisors of both numbers and picking the largest common one. For gcd(48, 18): divisors of 48 are {1,2,3,4,6,8,12,16,24,48}, divisors of 18 are {1,2,3,6,9,18}, common divisors are {1,2,3,6}, so gcd = 6. This works but takes O(min(a,b)) time — terrible for 300-digit numbers.

Euclid's Algorithm

Around 300 BC, Euclid discovered something beautiful. The GCD has a recursive property:

gcd(a, b) = gcd(b, a mod b)

Why does this work? If d divides both a and b, then d also divides a - qb = a mod b (where q = ⌊a/b⌋). So every common divisor of (a, b) is also a common divisor of (b, a mod b), and vice versa. The sets of common divisors are identical, so their maximums are identical.

The base case: gcd(a, 0) = a. When the second argument reaches 0, the first argument is the GCD.

Key insight: logarithmic convergence. Each step of Euclid's algorithm reduces the smaller number by at least half. After two steps, both numbers have shrunk. The total number of steps is at most 2 · log₂(min(a,b)) — that is O(log min(a,b)). For 300-digit numbers, that is about 2000 steps. Euclid's algorithm is astonishingly efficient.

Why does the smaller number shrink by at least half? Consider gcd(a, b) with a ≥ b. If b ≤ a/2, then after one step we have (b, a mod b) and the larger argument went from a to b ≤ a/2. If b > a/2, then a mod b = a - b < a/2, so after one step we have (b, a-b) where a-b < a/2, and the next step gives (a-b, b mod (a-b)) which again has shrunk. Either way, after two steps the max of the pair has halved. The worst case is consecutive Fibonacci numbers: gcd(F_n, F_n-1) takes exactly n-2 steps. This is because Fibonacci numbers are the slowest to converge under Euclid's algorithm — each quotient is exactly 1, so the remainder shrinks as slowly as possible.

Lamé's Theorem (1844). The number of steps in Euclid's algorithm on inputs (a, b) with a ≥ b ≥ 1 is at most 5 times the number of decimal digits of b. So for b with 100 digits, at most 500 division steps. This was one of the first results in computational complexity theory — decades before computers existed.

Worked Example: gcd(252, 105)

Step 1

gcd(252, 105): 252 = 2 × 105 + 42, so gcd(252,105) = gcd(105, 42)

↓

Step 2

gcd(105, 42): 105 = 2 × 42 + 21, so gcd(105,42) = gcd(42, 21)

↓

Step 3

gcd(42, 21): 42 = 2 × 21 + 0, so gcd(42,21) = gcd(21, 0)

↓

Done

gcd(21, 0) = 21. So gcd(252, 105) = 21

The simulation below animates each step. Enter two numbers and watch the algorithm unwind.

Euclid's Algorithm — Animated

Enter two positive integers and watch the division steps. The algorithm traces down the left, then the extended coefficients trace back up the right.

a: b:

The Extended Euclidean Algorithm

Euclid's algorithm tells us the GCD. The extended version also finds integers x and y such that:

ax + by = gcd(a, b)

This is Bézout's identity. It always has a solution, and the extended algorithm finds it by "unwinding" the recursion. At each step, we express the remainder as a linear combination of a and b.

Worked example: find x, y such that 252x + 105y = gcd(252, 105) = 21.

Step 3: 21 = 105 - 2 × 42 // from step 2 Step 2: 42 = 252 - 2 × 105 // from step 1 Substitute step 2 into step 3: 21 = 105 - 2 × (252 - 2 × 105) = 105 - 2 × 252 + 4 × 105 = 5 × 105 - 2 × 252 So x = -2, y = 5. Check: 252 × (-2) + 105 × 5 = -504 + 525 = 21 ✓

Why extended Euclid matters for RSA. To compute the RSA private key d, we need the modular inverse of e modulo φ(n) — that is, d such that e · d ≡ 1 (mod φ(n)). This is exactly Bézout's identity: e · d + φ(n) · k = 1 = gcd(e, φ(n)). The extended Euclidean algorithm gives us d directly.

Iterative Extended Euclid

The recursive version is elegant but uses O(log n) stack frames. For very large numbers (1000+ digits), stack overflow is a risk. Here is the iterative version that uses constant extra space:

python
def extended_gcd_iter(a, b):
    """Iterative extended Euclid. Returns (gcd, x, y).
    Constant space, no recursion."""
    old_r, r = a, b
    old_s, s = 1, 0    # x coefficients
    old_t, t = 0, 1    # y coefficients

    while r != 0:
        q = old_r // r
        old_r, r = r, old_r - q * r
        old_s, s = s, old_s - q * s
        old_t, t = t, old_t - q * t

    return old_r, old_s, old_t  # gcd, x, y

# Trace for gcd(252, 105):
# q=2: r=42,  s=-2, t=1
# q=2: r=21,  s=5,  t=-2
# q=2: r=0    → gcd=21, x=-2, y=5

Application: Solving Linear Congruences

The equation ax ≡ b (mod n) has solutions if and only if g = gcd(a, n) divides b. If it does, there are exactly g solutions modulo n. To find them: first solve ax ≡ g (mod n) using extended Euclid, then multiply by b/g.

Example: solve 14x ≡ 6 (mod 22). gcd(14, 22) = 2, and 2 divides 6, so solutions exist. First solve 14x ≡ 2 (mod 22): extended Euclid gives x₀ = -3 ≡ 19 (mod 22). Then 14 × 19 = 266 = 12 × 22 + 2. ✓. Multiply by 6/2 = 3: x = 19 × 3 = 57 ≡ 13 (mod 22). Check: 14 × 13 = 182 = 8 × 22 + 6. ✓. The two solutions mod 22 are x = 13 and x = 13 + 11 = 2 (since n/g = 22/2 = 11). Check: 14 × 2 = 28 = 1 × 22 + 6. ✓.

python
def gcd(a, b):
    """Euclid's algorithm: O(log min(a,b))"""
    while b != 0:
        a, b = b, a % b
    return a

def extended_gcd(a, b):
    """Returns (gcd, x, y) such that a*x + b*y = gcd(a,b)"""
    if b == 0:
        return a, 1, 0   # base case: a*1 + 0*0 = a
    else:
        g, x1, y1 = extended_gcd(b, a % b)
        # From: b*x1 + (a%b)*y1 = g
        # Since a%b = a - (a//b)*b:
        #   b*x1 + (a - (a//b)*b)*y1 = g
        #   a*y1 + b*(x1 - (a//b)*y1) = g
        return g, y1, x1 - (a // b) * y1

# Example
g, x, y = extended_gcd(252, 105)
print(f"gcd={g}, x={x}, y={y}")   # gcd=21, x=-2, y=5
print(f"Check: 252*{x} + 105*{y} = {252*x + 105*y}")  # 21

def mod_inverse(e, phi):
    """Compute d = e^(-1) mod phi using extended Euclid."""
    g, x, _ = extended_gcd(e, phi)
    if g != 1:
        raise ValueError("No inverse: gcd(e, phi) != 1")
    return x % phi  # ensure positive

Concept check: What is gcd(48, 18)?

3 6 — because 48 = 2×18 + 12, gcd(18,12): 18 = 1×12 + 6, gcd(12,6): 12 = 2×6 + 0, so gcd = 6 9 12

Chapter 2: Modular Arithmetic

You already know modular arithmetic — you just call it "clock math." If it is 10 o'clock and you wait 5 hours, it is 3 o'clock, not 15 o'clock. Your clock wraps around at 12. That wrapping is modular arithmetic: 10 + 5 ≡ 3 (mod 12).

Formally: we say a ≡ b (mod n) (read "a is congruent to b modulo n") if n divides (a - b). Equivalently, a and b have the same remainder when divided by n. The number n is called the modulus.

Modular Operations

The beauty of modular arithmetic is that you can reduce at every step. You never need to work with numbers bigger than n²:

(a + b) mod n = ((a mod n) + (b mod n)) mod n
(a × b) mod n = ((a mod n) × (b mod n)) mod n

This means we can compute things like 7²⁵⁶ mod 13 without ever computing 7²⁵⁶ (a 217-digit number). We just reduce after each multiplication. This is the reason modular arithmetic is so powerful in computation — it keeps numbers bounded.

However, subtraction and division require care. Subtraction works the same way: (a - b) mod n = ((a mod n) - (b mod n) + n) mod n. The +n prevents negative results. But division is different: there is no "modular division" operator. Instead, we multiply by the modular inverse: a/b mod n = a × b^-1 mod n. The inverse b^-1 exists only when gcd(b, n) = 1 (more on this shortly).

Here is a concrete example showing why reducing at each step matters:

Task: compute 123 × 456 × 789 mod 1000 Without modular reduction: 123 × 456 = 56,088 56,088 × 789 = 44,253,432 44,253,432 mod 1000 = 432 With modular reduction at each step: 123 × 456 = 56,088 mod 1000 = 88 88 × 789 = 69,432 mod 1000 = 432 // same answer, smaller intermediates!

In the first approach, the intermediate value reaches 44 million. In the second, it never exceeds about 70,000. For real cryptographic computations with 2048-bit numbers, the intermediates without reduction would have millions of digits. With reduction, they never exceed 2 × 2048 = 4096 bits.

But even better, there is a trick called repeated squaring (also called fast modular exponentiation) that computes a^b mod n in only O(log b) multiplications.

Fast Modular Exponentiation (Repeated Squaring)

The idea: express the exponent b in binary. Then a^b is just a product of powers of the form a^{2^k}, which we compute by repeatedly squaring.

Worked example: compute 7¹³ mod 11.

13 in binary = 1101 // 8 + 4 + 1 So 7¹³ = 7⁸ × 7⁴ × 7¹ Start: result = 1, base = 7 Bit 0 (1): result = 1 × 7 = 7 mod 11 = 7. base = 7² = 49 mod 11 = 5 Bit 1 (0): skip. base = 5² = 25 mod 11 = 3 Bit 2 (1): result = 7 × 3 = 21 mod 11 = 10. base = 3² = 9 mod 11 = 9 Bit 3 (1): result = 10 × 9 = 90 mod 11 = 2. base = 9² = 81 mod 11 = 4 Answer: 7¹³ mod 11 = 2

Only 4 multiplications instead of 12. For a 2048-bit exponent, that is about 2048 squarings instead of 2²⁰⁴⁸ multiplications. This is how RSA encryption and decryption remain fast even with enormous exponents.

Fermat's Little Theorem. If p is prime and a is not divisible by p, then a^p-1 ≡ 1 (mod p). This is the foundation of primality testing (Chapter 4) and also explains why RSA decryption recovers the original message. Specifically, it guarantees that (m^e)^d ≡ m (mod n) when e and d are chosen correctly.

Modular Inverse

The modular inverse of a modulo n is a number a^-1 such that a · a^-1 ≡ 1 (mod n). Not every number has an inverse. The inverse of a mod n exists if and only if gcd(a, n) = 1 (they are coprime). When it exists, extended Euclid computes it.

Quick example: what is 3^-1 mod 7? We need x such that 3x ≡ 1 (mod 7). Try x = 5: 3 × 5 = 15 = 2 × 7 + 1. Yes! So 3^-1 ≡ 5 (mod 7). When the modulus is prime, there is a shortcut via Fermat: a^-1 ≡ a^p-2 (mod p). So 3^-1 ≡ 3⁵ ≡ 243 ≡ 5 (mod 7). Same answer.

The Group Z_n*

The set of integers in {1, 2, ..., n-1} that are coprime to n forms a multiplicative group under multiplication mod n, denoted Z_n*. This group has φ(n) elements (Euler's totient). When n is prime, every non-zero element has an inverse, so Z_p* = {1, 2, ..., p-1} and φ(p) = p-1.

This algebraic structure is the engine behind RSA. Euler's theorem generalizes Fermat: for any a with gcd(a, n) = 1, a^φ(n) ≡ 1 (mod n). When n = pq (product of two primes), φ(n) = (p-1)(q-1), and the RSA proof falls out directly from this group-theoretic fact.

Generators and Primitive Roots

A generator (or primitive root) of Z_p* is an element g whose powers generate every element of the group: {g¹, g², ..., g^p-1} = {1, 2, ..., p-1} (mod p). Not every element is a generator. For p = 7, the element g = 3 is a generator:

3¹ mod 7 = 3 3² mod 7 = 2 3³ mod 7 = 6 3⁴ mod 7 = 4 3⁵ mod 7 = 5 3⁶ mod 7 = 1 (back to start — full cycle!) Powers of 3 give {3, 2, 6, 4, 5, 1} = {1, 2, 3, 4, 5, 6} = Z₇* ✓

Generators are crucial for Diffie-Hellman (Chapter 5) — the protocol requires a generator g of Z_p* so that g^a and g^b can produce any element. By a theorem of Gauss, Z_p* always has primitive roots (for prime p), and there are exactly φ(p-1) of them. For p = 7, φ(6) = 2, so there are 2 generators: g = 3 and g = 5.

Euler's Totient Function

Euler's totient φ(n) counts the integers in {1, ..., n} that are coprime to n. It has a beautiful closed-form based on the prime factorization of n:

φ(n) = n × ∏_{p | n} (1 - 1/p)

Examples: φ(12) = 12 × (1 - 1/2) × (1 - 1/3) = 12 × 1/2 × 2/3 = 4. The four integers coprime to 12 in {1,...,12} are {1, 5, 7, 11}. For prime p: φ(p) = p - 1 (everything except 0 is coprime to p). For n = pq with distinct primes: φ(pq) = (p-1)(q-1) — this is exactly the formula RSA uses.

A beautiful identity: for any positive integer n, the sum of φ(d) over all divisors d of n equals n. For example, the divisors of 12 are {1, 2, 3, 4, 6, 12}, and φ(1) + φ(2) + φ(3) + φ(4) + φ(6) + φ(12) = 1 + 1 + 2 + 2 + 2 + 4 = 12. This identity has a beautiful combinatorial proof: partition the fractions {1/n, 2/n, ..., n/n} by their reduced denominators.

Order and Discrete Logarithms

The order of a modulo n, denoted ord_n(a), is the smallest positive integer k such that a^k ≡ 1 (mod n). By Lagrange's theorem, the order always divides φ(n). For example, ord₇(2) = 3 because 2¹=2, 2²=4, 2³=8≡1 (mod 7), and 3 divides φ(7)=6.

The discrete logarithm of b to base g modulo n is the unique integer x with 0 ≤ x < ord_n(g) such that g^x ≡ b (mod n). Finding discrete logarithms is believed to be computationally hard for well-chosen groups — this is the basis of Diffie-Hellman security. The best known algorithm (number field sieve for discrete logs) runs in sub-exponential time, similar to factoring.

Repeated Squaring: Step by Step Visualization

The visualization below shows the repeated squaring process. Enter base, exponent, and modulus, and watch each step: the binary decomposition of the exponent, each squaring, and each conditional multiplication.

Repeated Squaring — Step by Step

Enter base, exponent, and modulus. The algorithm decomposes the exponent into binary and processes each bit.

base: exp: mod:

The clock visualization below makes modular arithmetic tangible. Set the modulus and watch how addition and multiplication wrap around.

Modular Arithmetic Clock

Choose a modulus, then compute a op b mod n. Watch the result on the clock face.

a: b: mod

python
def mod_exp(base, exp, mod):
    """Compute base^exp mod mod using repeated squaring. O(log exp)."""
    result = 1
    base = base % mod
    while exp > 0:
        if exp % 2 == 1:        # current bit is 1
            result = (result * base) % mod
        exp = exp >> 1             # shift right (divide by 2)
        base = (base * base) % mod  # square the base
    return result

# Example: 7^13 mod 11
print(mod_exp(7, 13, 11))  # 2

# Verify Fermat's Little Theorem: a^(p-1) ≡ 1 (mod p) for prime p
p = 17
for a in range(1, p):
    assert mod_exp(a, p - 1, p) == 1, f"Fermat failed for a={a}"
print("Fermat's Little Theorem verified for p=17")

Concept check: How many multiplications does repeated squaring need to compute a¹⁰⁰⁰ mod n?

1000 multiplications About 10 — because 1000 in binary is 1111101000, which has 10 bits, so we need at most about 20 operations (10 squarings + up to 10 multiplications for set bits) 500 (half of the exponent) log(1000) = 3

Chapter 3: RSA Cryptosystem

Now we have all the tools: GCD, extended Euclid, modular arithmetic, fast exponentiation. Let us build the most influential cryptosystem ever invented.

RSA (Rivest-Shamir-Adleman, 1977) works like a padlock. Anyone can snap the padlock shut (encrypt with the public key), but only the person with the key can open it (decrypt with the private key). Here is the construction, step by step.

Step 1: Key Generation

Pick primes

Choose two large primes p and q (in practice, 512-1024 bits each)

↓

Compute n

n = p × q. This is the modulus. Its bit length is the "key size" (e.g. 2048 bits)

↓

Compute φ(n)

φ(n) = (p-1)(q-1) — Euler's totient: count of integers in [1,n] coprime to n

↓

Choose e

Pick e such that 1 < e < φ(n) and gcd(e, φ(n)) = 1. Common choice: e = 65537

↓

Compute d

d = e^-1 mod φ(n) — the modular inverse via extended Euclid

↓

Keys ready

Public key: (n, e). Private key: d. Destroy p, q, φ(n).

Step 2: Encryption

To encrypt a message m (a number with 0 ≤ m < n):

c = m^e mod n

This uses fast modular exponentiation — O(log e) multiplications mod n. Even with e = 65537 (a 17-bit number), that is only about 17 squarings.

Step 3: Decryption

To decrypt ciphertext c:

m = c^d mod n

Why does this recover m? Because c^d = (m^e)^d = m^ed. Since ed ≡ 1 (mod φ(n)), we have ed = 1 + kφ(n) for some integer k. By Euler's theorem (the generalization of Fermat's little theorem): m^φ(n) ≡ 1 (mod n) when gcd(m, n) = 1. Therefore:

m^ed = m^{1 + kφ(n)} = m × (m^φ(n))^k ≡ m × 1^k = m (mod n)

The RSA trapdoor. Anyone can compute c = m^e mod n (they know n and e — the public key). But to reverse this, you need d, and computing d requires φ(n) = (p-1)(q-1), which requires knowing p and q. To find p and q from n requires factoring n. For large n, this is computationally infeasible. That is the trapdoor.

Why Destroy p, q, and φ(n)?

After computing d, the key generator must erase p, q, and φ(n) from memory. Why? Because if an attacker obtains any one of these, RSA collapses:

Knowing p (or q) lets you compute q = n/p, then φ(n), then d.
Knowing φ(n) lets you compute d = e^-1 mod φ(n) directly.
Knowing d lets you decrypt any ciphertext. Game over.

In practice, the private key is stored in a hardware security module (HSM) or encrypted on disk. The primes are generated, used, and immediately wiped. This is why "key generation" is a ceremony in high-security settings — performed on air-gapped machines that are physically destroyed afterward.

Padding: Why Textbook RSA Is Insecure

The RSA we described is called "textbook RSA" and is not secure for direct use. The problem: it is deterministic. Encrypting the same message m twice gives the same ciphertext c. An attacker who guesses the message can verify their guess by encrypting it with the public key and comparing.

Real RSA uses padding schemes like OAEP (Optimal Asymmetric Encryption Padding) that add randomness before encryption. OAEP hashes the message with random data, making each encryption unique. The padded message is then encrypted with textbook RSA. Without padding, RSA is vulnerable to chosen-ciphertext attacks, small-message attacks (m^e < n), and more.

In practice. You almost never encrypt data directly with RSA. RSA is too slow for bulk encryption (about 1000x slower than AES). Instead, RSA encrypts a random session key, and the session key encrypts the actual data with a fast symmetric cipher like AES-256-GCM. This is called hybrid encryption and is what TLS actually does.

Worked Example with Small Primes

Key generation: p = 61, q = 53 n = 61 × 53 = 3233 φ(n) = 60 × 52 = 3120 Choose e = 17 (check: gcd(17, 3120) = 1 ✓) d = 17^-1 mod 3120 = 2753 (verify: 17 × 2753 = 46801 = 15 × 3120 + 1 ✓) Encryption: message m = 65 (the letter 'A') c = 65¹⁷ mod 3233 = 2790 Decryption: m = 2790²⁷⁵³ mod 3233 = 65 ✓

The interactive demo below lets you generate RSA keys with small primes, encrypt a number, and decrypt it. Watch every step of the math.

Interactive RSA — Key Generation, Encryption, Decryption

Pick two primes, generate keys, then encrypt and decrypt a message (number).

p: q:

Message (0 to n-1):

python
def rsa_keygen(p, q):
    """Generate RSA key pair from primes p, q."""
    n = p * q
    phi = (p - 1) * (q - 1)

    # Choose e coprime to phi. 65537 is standard if it works.
    e = 65537
    if gcd(e, phi) != 1:
        # Fallback: find smallest e > 1 coprime to phi
        e = 3
        while gcd(e, phi) != 1:
            e += 2

    d = mod_inverse(e, phi)   # from extended Euclid
    return (n, e), d           # public_key, private_key

def rsa_encrypt(m, public_key):
    n, e = public_key
    return mod_exp(m, e, n)    # c = m^e mod n

def rsa_decrypt(c, d, n):
    return mod_exp(c, d, n)    # m = c^d mod n

# Full example
pub, priv = rsa_keygen(61, 53)
print(f"Public: n={pub[0]}, e={pub[1]}")
print(f"Private: d={priv}")

c = rsa_encrypt(65, pub)
print(f"Encrypted: {c}")

m = rsa_decrypt(c, priv, pub[0])
print(f"Decrypted: {m}")  # 65

# Encrypt a string (character by character)
def rsa_encrypt_string(msg, public_key):
    return [rsa_encrypt(ord(ch), public_key) for ch in msg]

def rsa_decrypt_string(ciphertext, d, n):
    return ''.join(chr(rsa_decrypt(c, d, n)) for c in ciphertext)

ct = rsa_encrypt_string("HELLO", pub)
print(f"Ciphertext: {ct}")
print(f"Decrypted: {rsa_decrypt_string(ct, priv, pub[0])}")  # HELLO

Security parameter. The security of RSA is measured by the bit-length of n. Current recommendations: 2048 bits minimum (equivalent to about 112 bits of symmetric security), 3072 bits for medium-term (128-bit security), 4096 bits for long-term. The primes p and q should be roughly equal in bit-length (so each is about half the key size) and should not be too close together (|p - q| should be large) to resist Fermat's factoring method.

RSA Key Generation: Finding Large Primes

How does RSA actually find 512-bit primes? The process is deceptively simple:

Generate random

Pick a random 512-bit odd number r

↓

Trial division

Check if r is divisible by small primes (2, 3, 5, ..., up to ~1000). This eliminates ~88% of candidates cheaply.

↓

Miller-Rabin

Run 40 rounds of Miller-Rabin. If it passes, accept r as prime.

↻ If composite, try r + 2

By the Prime Number Theorem, the density of primes near N is approximately 1/ln(N). For 512-bit numbers, ln(2⁵¹²) ≈ 355, so roughly 1 in every 355 odd numbers is prime. On average, we test about 178 candidates before finding a prime. With efficient Miller-Rabin (milliseconds per test), the entire process takes under a second.

The trial division pre-filter is critical for efficiency. Without it, every candidate would require the full (expensive) Miller-Rabin test. With it, 88% of candidates are eliminated by a handful of cheap divisions. This is a pattern you see throughout algorithms: cheap approximate filters before expensive exact tests.

python
import random

def generate_prime(bits):
    """Generate a random prime with the specified number of bits."""
    small_primes = sieve(1000)  # precompute primes up to 1000
    attempts = 0
    while True:
        # Step 1: random odd number in range [2^(bits-1), 2^bits - 1]
        n = random.getrandbits(bits)
        n |= (1 << (bits - 1)) | 1  # set MSB and LSB
        attempts += 1

        # Step 2: trial division by small primes
        if any(n % p == 0 for p in small_primes if p < n):
            continue

        # Step 3: Miller-Rabin (40 rounds → error prob < 2^(-80))
        if miller_rabin(n, k=40):
            return n, attempts

# Example: find a 64-bit prime
p, tries = generate_prime(64)
print(f"Found {bits}-bit prime after {tries} attempts: {p}")
# Typically finds one in ~22 attempts (ln(2^64)/2 ≈ 22)

Concept check: In RSA, why is e = 65537 a popular choice for the public exponent?

It is the largest prime less than 2¹⁶ It makes the key harder to crack It is prime, coprime to most φ(n), and has only two 1-bits in binary (10000000000000001), so encryption needs just 17 squarings and 2 multiplications — fast and secure It is required by the RSA standard

Chapter 4: Primality Testing

RSA begins with "pick two large primes." But how do you find a 512-bit prime? You cannot check a list — there are about 2⁵¹²/ln(2⁵¹²) ≈ 10¹⁵¹ primes that large. Instead, you generate a random odd number and test whether it is prime. By the prime number theorem, roughly 1 in every 355 512-bit odd numbers is prime, so you will find one quickly.

But how do you test primality efficiently?

Trial Division: The Brute Force

The simplest test: check if any integer from 2 to √n divides n. If none do, n is prime. Why only up to √n? Because if n = a × b and a ≤ b, then a ≤ √n. So if there is a factor, we will find the smaller one before √n.

This is O(√n) — for a 512-bit number, that is 2²⁵⁶ ≈ 10⁷⁷ divisions. A computer doing 10⁹ divisions per second would take 10⁶⁸ seconds — roughly 10⁶⁰ times the age of the universe. Far too slow. We need something dramatically better.

A small optimization: only check odd numbers (after checking 2) and stop at √n. Better yet, only check primes up to √n (using a precomputed sieve). This reduces constant factors but not the fundamental O(√n) scaling. For practical purposes, trial division is fine for numbers up to about 10¹² (roughly 40 bits) — √(10¹²) = 10⁶, which takes microseconds. Beyond that, we need probabilistic methods.

Fermat Test: A Probabilistic Approach

Fermat's little theorem says: if p is prime and 1 ≤ a < p, then a^p-1 ≡ 1 (mod p). The contrapositive: if a^p-1 ≠ 1 (mod p), then p is definitely composite.

So to test if n is prime: pick a random a, compute a^n-1 mod n (using repeated squaring — fast!). If the result is not 1, n is composite. If it is 1, n is probably prime.

The problem: there exist Carmichael numbers (like 561 = 3 × 11 × 17) that pass the Fermat test for every base a coprime to n. These "pseudoprimes" fool the Fermat test completely. How common are they? There are infinitely many Carmichael numbers (proved in 1994 by Alford, Granville, and Pomerance). The first few are 561, 1105, 1729, 2465, 2821. They are rare relative to primes — up to 10¹⁵ there are about 105,212 Carmichael numbers versus ~29.8 trillion primes — but their existence means the Fermat test alone is insufficient for cryptographic use.

Why does 561 fool the Fermat test? Because 561 = 3 × 11 × 17, and by the Chinese Remainder Theorem, a⁵⁶⁰ ≡ 1 (mod 561) for all a coprime to 561. This happens because 560 is divisible by 2 (= 3-1), 10 (= 11-1), and 16 (= 17-1). The key property: n is Carmichael if and only if n is square-free and for every prime p dividing n, (p-1) divides (n-1). This is Korselt's criterion.

Miller-Rabin: The Industry Standard

The Miller-Rabin test strengthens the Fermat test by exploiting an additional property of primes. Write n - 1 = 2^s · d where d is odd (factor out all powers of 2). If n is prime and gcd(a, n) = 1, then either:

a^d ≡ 1 (mod n)
or a^{2^r·d} ≡ -1 (mod n) for some 0 ≤ r < s

If neither condition holds, n is definitely composite, and a is called a witness to n's compositeness. If a condition does hold, a is called a strong liar — it fooled us into thinking n might be prime.

Why does this work? The key insight is about non-trivial square roots of 1. If n is prime and x² ≡ 1 (mod n), then x ≡ ±1 (mod n) — primes have no other square roots of 1. But if n is composite, there can be non-trivial square roots (numbers other than ±1 whose square is 1 mod n). The Miller-Rabin test detects these: if we find x such that x² ≡ 1 (mod n) but x ≠ ±1 (mod n), we have proof that n is composite.

Let us see this concretely.

Worked example: test n = 561 (the smallest Carmichael number, 561 = 3 × 11 × 17). We write n - 1 = 560 = 2⁴ × 35 (so s = 4, d = 35). Pick witness a = 2:

2³⁵ mod 561 = 263 // not 1, not -1 (560). Keep squaring. 263² mod 561 = 166 // not -1. Keep squaring. 166² mod 561 = 67 // not -1. Keep squaring. 67² mod 561 = 1 // got 1 but previous was 67 ≠ ±1 // 67 is a non-trivial sqrt of 1! COMPOSITE — witness a=2 proves 561 is not prime.

The Fermat test would compute 2⁵⁶⁰ mod 561 = 1 and incorrectly say "probably prime." Miller-Rabin looks at the path to that 1 and catches the inconsistency. Specifically, it found that 67² ≡ 1 (mod 561) but 67 ≠ ±1 (mod 561). In a prime number, the only square roots of 1 are 1 and -1. The existence of other square roots (like 67) is proof of compositeness.

This is exactly the property that Carmichael numbers cannot hide. Even though a^n-1 ≡ 1 (mod n) for all a coprime to n (the Fermat condition), the squaring sequence from a^d to a^n-1 will reveal non-trivial square roots of 1 for at least 3/4 of all bases a. The Miller-Rabin test exploits this structural weakness.

Key guarantee. For any odd composite n, at least 3/4 of the possible bases a in [2, n-2] are witnesses. So each random test has at most a 1/4 chance of being fooled. After k independent tests, the probability of error is at most (1/4)^k. With k = 40 rounds, the error probability is about 10^-24 — less likely than a cosmic ray flipping a bit in your CPU.

The simulation below steps through the Miller-Rabin test. Enter a number, pick the number of rounds, and watch the algorithm check witnesses.

Miller-Rabin Primality Test — Step by Step

Enter a number and click "Test". Watch the algorithm pick random witnesses and check each condition.

Rounds: 5

python
import random

def miller_rabin(n, k=40):
    """Test if n is probably prime with k rounds.
    Returns False if definitely composite, True if probably prime.
    Error probability ≤ (1/4)^k."""
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0: return False

    # Write n-1 = 2^s * d with d odd
    s, d = 0, n - 1
    while d % 2 == 0:
        s += 1
        d //= 2

    for _ in range(k):
        a = random.randrange(2, n - 1)
        x = mod_exp(a, d, n)        # a^d mod n

        if x == 1 or x == n - 1:
            continue               # passed this round

        for _ in range(s - 1):
            x = mod_exp(x, 2, n)   # square repeatedly
            if x == n - 1:
                break              # found -1, passed
        else:
            return False          # witness found: COMPOSITE

    return True                   # probably prime

# Test: 561 is a Carmichael number (fools Fermat test, NOT Miller-Rabin)
print(miller_rabin(561))   # False — correctly detected as composite!
print(miller_rabin(104729)) # True — this is actually prime

Test	Type	Complexity	Notes
Trial division	Deterministic	O(√n)	Simple but too slow for large n
Fermat test	Probabilistic	O(k log² n)	Fooled by Carmichael numbers
Miller-Rabin	Probabilistic	O(k log² n)	Industry standard. Error ≤ (1/4)^k
AKS	Deterministic	O(log⁶ n)	Polynomial! But impractical constants

The AKS Breakthrough

In 2002, Agrawal, Kayal, and Saxena (three Indian computer scientists, two of them undergraduates) proved that primality testing is in P — there exists a deterministic polynomial-time algorithm. Their paper "PRIMES is in P" resolved a question that had been open for decades.

The AKS algorithm is based on a generalization of Fermat's little theorem to polynomials: n is prime if and only if (x + a)ⁿ ≡ xⁿ + a (mod n) for all a coprime to n. Checking this directly requires expanding a polynomial of degree n, which is exponential. The AKS insight is that you only need to check this identity modulo x^r - 1 for a carefully chosen r = O(log² n), and for O(log n) values of a.

The original algorithm ran in O(log¹² n) time. Subsequent improvements brought it to O(log⁶ n). Despite being polynomial, the constant factors are enormous — for a 1024-bit number, AKS is millions of times slower than Miller-Rabin. So in practice, everyone uses Miller-Rabin with enough rounds. AKS was a theoretical triumph (proving PRIMES ∈ P), not a practical improvement.

Deterministic Miller-Rabin. Under the Generalized Riemann Hypothesis (GRH), it suffices to test bases a = 2, 3, ..., 2(ln n)². This makes Miller-Rabin deterministic in O(log⁴ n) time — if GRH is true. For n < 3.3 × 10²⁴, testing just the 12 bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} gives a deterministic result. This is the approach used in competitive programming (see Chapter 8).

Concept check: Why does Miller-Rabin succeed on Carmichael number 561 while the Fermat test fails?

Miller-Rabin checks an additional condition: not just a^n-1 ≡ 1, but also the sequence of squarings from a^d to a^n-1 must pass through -1 (mod n). For composites, this sequence often reveals a "non-trivial square root of 1" that primes cannot have. Miller-Rabin uses more random witnesses Miller-Rabin uses larger bases Miller-Rabin runs Euclid's algorithm as a subroutine

Chapter 5: Cryptography Showcase

This is the payoff. Everything from Chapters 1-4 comes together in two interactive demonstrations: a full RSA message encryption pipeline, and a Diffie-Hellman key exchange where Alice and Bob agree on a secret over an insecure channel.

Full RSA Demo

In the real world, RSA does not encrypt just numbers — it encrypts messages. Each character is converted to a number (its ASCII code), encrypted separately, and the ciphertext is a sequence of numbers. (Real RSA uses OAEP padding and encrypts blocks, not individual characters — but character-by-character encryption shows the math clearly.)

The simulation below lets you type a short message, generate keys, and see every character encrypted and decrypted with full math shown. Try different bit sizes: 8-bit keys use tiny primes (fast but insecure — you could factor n by hand). 16-bit keys use primes up to ~256, which are trivial for a computer but illustrate the full RSA flow.

Watch the numbers. Notice how the ciphertext numbers bear no visible relationship to the plaintext ASCII codes. The letter 'H' (code 72) might encrypt to 1847 — completely scrambled. Yet decryption perfectly recovers 72 every time. This is the magic of modular exponentiation: it is a permutation of {0, 1, ..., n-1}, and the private key tells you the inverse permutation.

RSA Message Pipeline — Character by Character

Type a message. Pick a key size. Click "Encrypt" to see each character become a number, get raised to the e-th power mod n, then "Decrypt" to reverse it.

Message: Bits:

Diffie-Hellman Key Exchange

RSA lets Alice send an encrypted message to Bob. But what if Alice and Bob have never met and need to agree on a shared secret key for symmetric encryption (like AES)? They cannot exchange a key in the open — anyone could eavesdrop. Enter Diffie-Hellman (1976).

The setup: Alice and Bob publicly agree on a large prime p and a generator g. Then:

Alice picks secret a

Computes A = g^a mod p and sends A to Bob

↓

Bob picks secret b

Computes B = g^b mod p and sends B to Alice

↓

Shared secret

Alice: s = B^a mod p. Bob: s = A^b mod p. Both equal g^ab mod p!

An eavesdropper sees p, g, A, and B but not a or b. To find the shared secret, they would need to compute a from A = g^a mod p — the discrete logarithm problem, which is believed to be computationally infeasible for large p.

Diffie-Hellman Key Exchange — Alice & Bob

Watch Alice and Bob agree on a shared secret while Eve eavesdrops on the public channel. The green values are secret; the orange values are public.

The magic. Alice and Bob have never met. They communicated only over a public channel that Eve can read. Yet they now share a secret that Eve cannot compute. This was considered impossible before 1976. Diffie-Hellman changed the world.

The Paint Mixing Analogy

Here is the most intuitive way to understand Diffie-Hellman. Imagine Alice and Bob each have a can of paint. They publicly agree on a common base color (say, yellow). Alice mixes her secret color (red) into the yellow to get orange, and sends the orange paint to Bob. Bob mixes his secret color (blue) into the yellow to get green, and sends the green paint to Alice. Now Alice adds her secret red to Bob's green, and Bob adds his secret blue to Alice's orange. Both end up with the same final color — but Eve, who saw the yellow, orange, and green paints, cannot unmix them to recover the secrets.

In the math version: "mixing" is modular exponentiation (easy to compute), and "unmixing" is the discrete logarithm (hard to compute). The fact that A^b = B^a = g^ab (mod p) is the commutativity that makes the protocol work.

Security Considerations

Diffie-Hellman is vulnerable to a man-in-the-middle attack: if Eve can intercept and modify messages, she can run separate DH exchanges with Alice and Bob, giving each a different shared secret. Neither Alice nor Bob would know they are talking to Eve instead of each other. This is why DH is always combined with authentication (digital signatures, certificates) in real protocols like TLS.

The security of DH depends on the Decisional Diffie-Hellman (DDH) assumption: given g, g^a, g^b, it is computationally infeasible to distinguish g^ab from a random element. This is believed to hold for carefully chosen prime groups and elliptic curve groups, but not for all groups.

Real-World Parameters

In practice, Diffie-Hellman uses carefully chosen parameters for security:

Parameter	Size (classic DH)	Size (ECDH)
Prime p / Curve order	2048-4096 bits	256-521 bits
Generator g / Base point G	Standardized (RFC 3526)	Standardized (P-256, X25519)
Secret exponent a, b	256+ bits	Same as curve order
Security level	~112 bits (2048-bit DH)	~128 bits (P-256)

The most popular DH variant today is X25519 (Curve25519), designed by Daniel Bernstein. It uses an elliptic curve over a prime field, is faster than traditional DH by ~10x, resistant to timing attacks by design, and is the default key exchange in TLS 1.3, SSH, Signal, and WireGuard. When you visit any HTTPS website in 2025, X25519 is almost certainly the key exchange running under the hood.

Concept check: In the Diffie-Hellman key exchange, Eve sees the public values A = g^a mod p and B = g^b mod p. Why can't she compute the shared secret s = g^ab mod p?

She doesn't know g or p Computing a from A = g^a mod p requires solving the discrete logarithm problem, which is believed to be computationally infeasible for large p. Without a or b, she cannot compute g^ab mod p. The values are encrypted before transmission She would need to factor p

The Full TLS Handshake (Simplified)

Here is how Chapters 1-5 come together in a real TLS connection:

Client Hello

Browser sends supported ciphers, random nonce

↓

Server Hello

Server sends certificate (containing RSA/ECDSA public key), chosen cipher

↓

Key Exchange

ECDH (X25519): both sides exchange public values, derive shared secret via DH

↓

Authentication

Server signs handshake transcript with RSA/ECDSA private key (from certificate)

↓

Symmetric Encryption

Shared secret → derive AES-256-GCM keys → all further data encrypted symmetrically

Every step uses number theory: ECDH for key exchange (modular arithmetic on elliptic curves), RSA/ECDSA for authentication (modular exponentiation, modular inverse), key derivation via HKDF (built on HMAC, which uses modular addition internally). The symmetric encryption (AES) is the only part that does not directly use number theory — it uses substitution-permutation networks instead.

Chapter 6: Chinese Remainder Theorem

A puzzle from 3rd-century China (Sun Tzu's Sunzi Suanjing, circa 3rd century AD): "There are certain things whose number is unknown. Repeatedly divided by 3, the remainder is 2; by 5, the remainder is 3; by 7, the remainder is 2. What will be the number?"

This ancient puzzle encodes a deep mathematical principle. The Chinese Remainder Theorem tells us that if you know the remainders of a number when divided by several pairwise coprime moduli, you can uniquely reconstruct the number (up to a multiple of the product of the moduli). This is like identifying a point in a multi-dimensional space from its projections onto each axis.

In modern notation: find x such that x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7). The answer is x = 23 (and x = 23 + 105k for any integer k, since 105 = 3 × 5 × 7).

The Chinese Remainder Theorem (CRT) says: if the moduli m₁, m₂, ..., m_k are pairwise coprime (every pair has gcd = 1), then the system of congruences has a unique solution modulo M = m₁ × m₂ × ... × m_k.

The Construction

For each i, let M_i = M / m_i (the product of all moduli except the i-th). Since gcd(M_i, m_i) = 1 (because the moduli are pairwise coprime), we can compute y_i = M_i^-1 mod m_i using extended Euclid. Then:

x = ∑_i=1^k a_i · M_i · y_i (mod M)

Worked example: x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7).

M = 3 × 5 × 7 = 105 M₁ = 105/3 = 35, y₁ = 35^-1 mod 3 = 2 (since 35 × 2 = 70 ≡ 1 mod 3) M₂ = 105/5 = 21, y₂ = 21^-1 mod 5 = 1 (since 21 × 1 = 21 ≡ 1 mod 5) M₃ = 105/7 = 15, y₃ = 15^-1 mod 7 = 1 (since 15 × 1 = 15 ≡ 1 mod 7) x = 2 × 35 × 2 + 3 × 21 × 1 + 2 × 15 × 1 = 140 + 63 + 30 = 233 = 233 mod 105 = 23 Check: 23 mod 3 = 2 ✓, 23 mod 5 = 3 ✓, 23 mod 7 = 2 ✓

CRT for RSA speedup. To compute m = c^d mod n where n = pq, you can compute m_p = c^{d mod (p-1)} mod p and m_q = c^{d mod (q-1)} mod q separately, then combine via CRT. This is about 4x faster than computing c^d mod n directly, because exponentiating mod a number half the bit-length is 4x cheaper (quadratic in bit-length). Every practical RSA implementation uses this optimization.

Why CRT Works: Intuition

Think of CRT as a "coordinate system" for numbers. Just as a point in 2D space is uniquely identified by its (x, y) coordinates, a number in [0, M) is uniquely identified by its residues modulo the coprime factors of M. The number 23, when "projected" onto the axes mod 3, mod 5, and mod 7, gives coordinates (2, 3, 2). CRT tells us how to go back from coordinates to the original number.

This is not just an analogy — it is a precise algebraic isomorphism. The map x ↦ (x mod m₁, x mod m₂, ..., x mod m_k) is a ring isomorphism from Z_M to Z_m₁ × Z_m₂ × ... × Z_{m_k}. This means addition and multiplication are preserved: you can add, subtract, and multiply in the "coordinate" representation and convert back at the end.

CRT in Practice: Garner's Algorithm

The formula we showed above involves large intermediate products (a_i · M_i · y_i can overflow). In practice, Garner's algorithm computes the result incrementally without large intermediates. It builds the answer one modulus at a time using a mixed-radix representation. This is the algorithm OpenSSL uses for CRT-based RSA decryption.

CRT Solver — System of Congruences

Enter up to 4 congruences x ≡ a_i (mod m_i). The solver shows the CRT construction step by step.

x ≡ (mod ), ≡ (mod ), ≡ (mod )

python
def chinese_remainder(remainders, moduli):
    """Solve x ≡ a_i (mod m_i) for pairwise coprime moduli.
    Returns x mod M where M = product of all moduli."""
    M = 1
    for m in moduli:
        M *= m

    x = 0
    for a_i, m_i in zip(remainders, moduli):
        M_i = M // m_i
        _, y_i, _ = extended_gcd(M_i, m_i)  # M_i * y_i ≡ 1 (mod m_i)
        x += a_i * M_i * y_i

    return x % M

# Sun Tzu's puzzle
print(chinese_remainder([2, 3, 2], [3, 5, 7]))  # 23

CRT Worked Example: RSA Decryption Speedup

Let us see the RSA-CRT optimization concretely. Suppose n = 3233 = 61 × 53, d = 2753, and we want to decrypt c = 2790.

Standard RSA: m = 2790²⁷⁵³ mod 3233 // exponent is 2753, modulus is 3233 RSA-CRT: d_p = d mod (p-1) = 2753 mod 60 = 53 d_q = d mod (q-1) = 2753 mod 52 = 49 m_p = 2790⁵³ mod 61 = 65 mod 61 = 4 // exponent 53, modulus 61 (much smaller!) m_q = 2790⁴⁹ mod 53 = 65 mod 53 = 12 // exponent 49, modulus 53 // Combine via CRT: find m such that m ≡ 4 (mod 61) and m ≡ 12 (mod 53) q_inv = 53^-1 mod 61 = 38 // since 53 × 38 = 2014 ≡ 1 (mod 61) h = q_inv × (m_p - m_q) mod p = 38 × (4 - 12) mod 61 = 38 × (-8) mod 61 = 38 × 53 mod 61 = 2014 mod 61 = 1 m = m_q + h × q = 12 + 1 × 53 = 65 ✓

Notice: instead of raising to power 2753 mod 3233, we raised to power 53 mod 61 and power 49 mod 53. Both the exponents and moduli are roughly half the size. Since modular exponentiation cost is O(log e × (log n)²), halving both gives approximately a 4x speedup. For 2048-bit RSA, this optimization is essential — without it, decryption would be noticeably slow even on modern hardware.

Concept check: Why does CRT speed up RSA decryption by about 4x?

CRT reduces the number of multiplications Instead of one exponentiation mod n (a k-bit number), CRT does two exponentiations mod p and mod q (each k/2 bits). Since modular exponentiation cost scales as bit-length squared, two (k/2)² = 2 × k²/4 = k²/2, versus k² — about 4x speedup after accounting for overhead. CRT uses smaller exponents CRT avoids the need for modular inverse

Chapter 7: Applications

Number theory is not an abstract curiosity. It is running right now in your browser, your phone, your bank, and the infrastructure that keeps civilization connected. Here is where the algorithms from this lesson appear in practice.

Cryptography

RSA is still widely used for key exchange and digital signatures. When your browser shows a padlock icon, TLS uses RSA (or elliptic curve variants) to establish a secure session. Digital certificates that prove a website's identity are signed with RSA.

Diffie-Hellman and its elliptic curve variant ECDH handle key agreement in almost every secure connection. When you open WhatsApp, Signal, or any end-to-end encrypted messenger, Diffie-Hellman runs in the background.

Elliptic Curve Cryptography (ECC) achieves the same security as RSA with much smaller keys. A 256-bit ECC key offers comparable security to a 3072-bit RSA key. This matters for mobile devices and IoT where bandwidth and computation are limited. ECC uses the same modular arithmetic primitives but over a different algebraic structure (points on an elliptic curve rather than integers mod n).

Hashing

Hash functions (Chapter 11) use modular arithmetic extensively. The division method h(k) = k mod m maps keys to table slots. Why should m be prime? Because if m has many small factors, keys with patterns (like multiples of common values) will cluster in certain slots. A prime m distributes keys more uniformly because there are no common factors to create patterns.

Universal hashing uses h(k) = ((ak + b) mod p) mod m where p is a prime larger than the universe of keys. This guarantees that for any two distinct keys, the probability of collision is at most 1/m — matching the ideal random function. The proof relies on the fact that Z_p is a field (every non-zero element has an inverse), which ensures the linear map ak + b is a bijection for any a ≠ 0.

Cryptographic hash functions like SHA-256 use modular addition of 32-bit words (mod 2³²) as a core mixing operation, combined with bitwise rotations and XOR. The non-linearity of modular addition (carries propagate unpredictably) is essential for the avalanche effect: changing one input bit should flip about half the output bits.

Random Number Generation

Linear congruential generators (LCGs) produce pseudorandom sequences using the recurrence X_n+1 = (aX_n + c) mod m. The constants a, c, m must be chosen carefully (using number-theoretic criteria) to achieve a full period of m. For full period: gcd(c, m) = 1, every prime dividing m must also divide (a-1), and if 4 divides m then 4 must divide (a-1). These conditions are pure number theory.

The Blum-Blum-Shub generator uses x_n+1 = x_n² mod n (where n = pq for large Blum primes p ≡ q ≡ 3 mod 4) for cryptographically secure randomness. Its security is provably equivalent to the difficulty of factoring n.

Digital Signatures

RSA is not just for encryption — it works in reverse for digital signatures. To sign a message m, the signer computes s = m^d mod n (using their private key). Anyone can verify: m = s^e mod n (using the public key). If the verification succeeds, the message must have been signed by someone who knows d. This provides authentication (the message came from the claimed sender), integrity (the message was not altered), and non-repudiation (the signer cannot deny signing).

In practice, you sign the hash of the message, not the message itself (messages can be arbitrarily large, but hashes are fixed-size). The signature is s = hash(m)^d mod n. The verifier checks hash(m) = s^e mod n.

Error-Correcting Codes

Reed-Solomon codes — used in QR codes, CDs, DVDs, and deep-space communication — are built on finite field arithmetic (modular arithmetic over polynomials). When your phone scans a partially damaged QR code and still reads it, that is number theory at work. A QR code can lose up to 30% of its data and still be decoded correctly, thanks to redundancy encoded using polynomial evaluation over finite fields (Galois fields GF(2⁸)).

The Voyager 1 and 2 space probes, now over 15 billion miles from Earth, transmit data at about 160 bits per second. The signal is so weak that individual bits are frequently corrupted. Reed-Solomon and convolutional codes (both built on modular arithmetic) enable scientists to reconstruct the original data with near-perfect accuracy from this noisy stream.

Blockchain and Proof of Work

Bitcoin's proof-of-work requires finding a nonce such that SHA-256(block || nonce) starts with a certain number of zero bits. This is essentially a modular arithmetic problem: find x such that f(x) mod 2^k = 0. The difficulty is adjusted by changing k. As of 2025, Bitcoin miners perform roughly 600 × 10¹⁸ (600 exahashes) SHA-256 computations per second worldwide. Each SHA-256 internally uses modular addition of 32-bit integers as its core mixing operation.

Digital signatures on Bitcoin transactions use ECDSA (Elliptic Curve Digital Signature Algorithm) over the secp256k1 curve. Each signature requires computing a modular inverse mod the curve order — the extended Euclidean algorithm from Chapter 1, running inside every Bitcoin wallet in the world.

Ethereum recently transitioned to proof-of-stake, but its smart contracts still use extensive modular arithmetic. The BN254 pairing-friendly elliptic curve (used in zk-SNARKs for privacy-preserving transactions) performs arithmetic modulo a 254-bit prime. Zero-knowledge proofs — a hot topic in blockchain and privacy — are built entirely on number-theoretic foundations.

Application	Number Theory Used	Algorithm
TLS/HTTPS	RSA or ECDH key exchange	Modular exponentiation, Euclid
Digital signatures	RSA-PSS, ECDSA	Modular inverse, fast exp
Hash tables	Division method, universal hashing	Modular arithmetic
Pseudorandom generators	LCG, Blum-Blum-Shub	Modular multiplication, squaring
Error correction	Reed-Solomon, BCH codes	Finite field arithmetic
Blockchain	Proof of work, ECDSA	Hashing (modular), discrete log
FFT (Chapter 30)	Roots of unity	Modular arithmetic over complex/finite fields

Post-quantum cryptography. Quantum computers can factor integers in polynomial time (Shor's algorithm), which would break RSA and Diffie-Hellman. The cryptography community is migrating to lattice-based and code-based schemes that resist quantum attacks. NIST standardized the first post-quantum algorithms (CRYSTALS-Kyber and CRYSTALS-Dilithium) in 2024. The number theory from this lesson remains relevant — understanding why RSA falls to quantum attacks requires understanding the number theory it is built on.

Timeline of Number Theory in Cryptography

Year	Event	Impact
~300 BC	Euclid's algorithm	Still used in every RSA implementation
1640	Fermat's little theorem	Foundation of primality testing
1763	Euler's theorem (generalized Fermat)	Why RSA decryption works
1801	Gauss, Disquisitiones Arithmeticae	Formalized modular arithmetic
1976	Diffie-Hellman key exchange	Solved the key distribution problem
1977	RSA cryptosystem	First practical public-key encryption
1980	Miller-Rabin test	Efficient probabilistic primality testing
1985	Elliptic curve cryptography (Koblitz, Miller)	Smaller keys, same security
1994	Shor's algorithm	Quantum threat to RSA/DH
2002	AKS primality test	PRIMES is in P (theoretical milestone)
2024	NIST post-quantum standards	Migration from RSA/DH to lattice-based crypto

Concept check: Why does elliptic curve cryptography achieve the same security as RSA with much smaller keys?

Elliptic curves use stronger hash functions Elliptic curves avoid modular arithmetic entirely The best known algorithm for the elliptic curve discrete log problem is fully exponential (like brute force), while factoring has sub-exponential algorithms (like the number field sieve). So ECC needs fewer bits to achieve the same security level. Elliptic curves compress the key using lossy encoding

Chapter 8: Interview Arsenal

Number theory problems appear regularly in coding interviews, competitive programming, and system design discussions. Here is your cheat sheet plus drills.

The Cheat Sheet

Tool	When to Use	Complexity
Euclid's GCD	gcd(a,b), lcm(a,b) = ab/gcd(a,b), coprimality check	O(log min(a,b))
Extended Euclid	Modular inverse, Bézout coefficients, linear Diophantine equations	O(log min(a,b))
Fast mod exp	a^b mod m, RSA, Fermat test, any large exponent	O(log b) mults
Miller-Rabin	Primality test for large numbers	O(k log² n)
CRT	Combine congruences with coprime moduli	O(k log M)
Sieve of Eratosthenes	All primes up to N	O(N log log N)

Common Interview Patterns

Number theory problems in interviews fall into a few patterns. Recognizing the pattern is half the battle:

Pattern recognition. If the problem says "mod 10⁹+7" → you need modular exponentiation. If it asks for GCD/LCM → Euclid. If it involves counting coprime pairs → Euler's totient. If it asks "is prime?" → Miller-Rabin or sieve. If it involves linear equations in integers → extended Euclid / Diophantine equations. If it says "modular inverse" → extended Euclid or Fermat (if mod is prime).

Drill 1: Modular Exponentiation

Problem: Compute a^b mod m for very large b (up to 10¹⁸). The key insight: you cannot compute a^b first then take mod — the intermediate result would have billions of digits. You must reduce modulo m at every step.

python
def mod_exp(base, exp, mod):
    """O(log exp) — the workhorse of number-theoretic algorithms."""
    result = 1
    base %= mod
    while exp > 0:
        if exp & 1:             # odd exponent: multiply in current base
            result = result * base % mod
        exp >>= 1              # divide exponent by 2
        base = base * base % mod  # square the base
    return result

# Python built-in: pow(base, exp, mod) does the same thing
assert mod_exp(7, 13, 11) == pow(7, 13, 11)

Drill 2: Extended Euclidean / Modular Inverse

Problem: Find x such that ax ≡ 1 (mod m). This has a solution if and only if gcd(a, m) = 1.

python
def mod_inverse(a, m):
    """Find x such that a*x ≡ 1 (mod m). Raises if no inverse."""
    g, x, _ = extended_gcd(a, m)
    if g != 1:
        raise ValueError(f"No inverse: gcd({a},{m})={g}")
    return x % m

# Alt for prime m: Fermat's little theorem gives a^(-1) = a^(m-2) mod m
def mod_inverse_fermat(a, p):
    """Only works when p is prime."""
    return pow(a, p - 2, p)

Drill 3: Sieve of Eratosthenes

Problem: Find all primes up to N. The sieve marks composites by iterating through each prime and crossing off its multiples. This is one of the oldest algorithms — attributed to Eratosthenes of Cyrene (circa 200 BC). The sieve is also a fundamental building block: many number-theoretic algorithms precompute a list of small primes as a first step.

The key optimization: start crossing off at p², not 2p. Why? All multiples of p smaller than p² have already been marked by smaller primes. For example, when processing prime 5, we start at 25 (not 10, 15, 20 — those were already crossed off by 2 and 3).

For very large N (billions), the standard sieve does not fit in memory. The segmented sieve processes the range in blocks of size √N, using the small primes (up to √N) to mark each block. This reduces memory from O(N) to O(√N).

The simulation below visualizes the sieve. Watch as each prime "crosses off" its multiples.

Sieve of Eratosthenes — Animated

Set N and click "Run Sieve" to watch composites get crossed off. Orange = current prime being processed. Red = crossed off composite. Green = confirmed prime.

N 60

python
def sieve(n):
    """Return all primes up to n. O(n log log n)."""
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    for i in range(2, int(n**0.5) + 1):
        if is_prime[i]:
            for j in range(i*i, n+1, i):
                is_prime[j] = False
    return [i for i in range(2, n+1) if is_prime[i]]

print(sieve(50))  # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Drill 4: Sieve of Eratosthenes (with Smallest Prime Factor)

An enhanced sieve stores the smallest prime factor (SPF) for each number. This enables O(log n) factorization of any number up to N by repeatedly dividing by the SPF.

python
def sieve_spf(n):
    """Sieve that stores smallest prime factor for each number.
    Enables O(log n) factorization."""
    spf = list(range(n + 1))  # spf[i] = i initially
    for i in range(2, int(n**0.5) + 1):
        if spf[i] == i:  # i is prime
            for j in range(i*i, n+1, i):
                if spf[j] == j:
                    spf[j] = i
    return spf

def factorize(n, spf):
    """Factorize n using the SPF sieve. O(log n)."""
    factors = []
    while n > 1:
        factors.append(spf[n])
        n //= spf[n]
    return factors

spf = sieve_spf(100)
print(factorize(84, spf))  # [2, 2, 3, 7]
print(factorize(97, spf))  # [97] — prime!

Drill 5: GCD-Based Problems

python
# LCM: least common multiple
def lcm(a, b):
    return a * b // gcd(a, b)

# Check coprimality
def coprime(a, b):
    return gcd(a, b) == 1

# Euler's totient: count of integers in [1,n] coprime to n
def euler_totient(n):
    """O(√n) — factorize n, then use the product formula."""
    result = n
    p = 2
    while p * p <= n:
        if n % p == 0:
            while n % p == 0:
                n //= p
            result -= result // p
        p += 1
    if n > 1:
        result -= result // n
    return result

print(euler_totient(3233))  # 3120 = (61-1)*(53-1)

Drill 6: Linear Diophantine Equations

Problem: Find all integer solutions to ax + by = c. By Bézout's identity, solutions exist if and only if gcd(a,b) divides c. If (x₀, y₀) is one solution, then all solutions are x = x₀ + (b/g)t, y = y₀ - (a/g)t for integer t, where g = gcd(a,b).

python
def solve_diophantine(a, b, c):
    """Solve ax + by = c. Returns (x0, y0, dx, dy) where
    all solutions are (x0 + dx*t, y0 + dy*t) for integer t.
    Returns None if no solution."""
    g, x, y = extended_gcd(abs(a), abs(b))
    if c % g != 0:
        return None  # no solution
    scale = c // g
    x0 = x * scale * (1 if a >= 0 else -1)
    y0 = y * scale * (1 if b >= 0 else -1)
    dx = b // g   # step in x direction
    dy = -a // g  # step in y direction
    return x0, y0, dx, dy

# Example: 3x + 5y = 4
# gcd(3,5)=1 divides 4, so solutions exist
x0, y0, dx, dy = solve_diophantine(3, 5, 4)
print(f"x={x0}+{dx}t, y={y0}+{dy}t")
# Verify: 3*x0 + 5*y0 should equal 4

Drill 7: Miller-Rabin in Contest Form

python
def is_prime(n):
    """Deterministic for n < 3.3 × 10^24 (using specific witnesses)."""
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0: return False
    s, d = 0, n - 1
    while d % 2 == 0: s += 1; d //= 2
    # These witnesses suffice for n < 3.3 × 10^24
    for a in [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]:
        if a >= n: continue
        x = pow(a, d, n)
        if x == 1 or x == n - 1: continue
        for _ in range(s - 1):
            x = pow(x, 2, n)
            if x == n - 1: break
        else: return False
    return True

Common Mistakes in Interviews

Mistake	Why It's Wrong	Correct Approach
Computing a^b then taking mod	a^b can be astronomically large — overflows any integer type	Take mod at every step (repeated squaring)
Assuming modular inverse always exists	a^-1 mod n exists only when gcd(a,n)=1	Check gcd first; raise error if not coprime
Using Fermat for non-prime modulus	a^m-1 ≡ 1 (mod m) only when m is prime	Use extended Euclid for general modular inverse
Forgetting negative results from extended GCD	x from extended_gcd can be negative	Return (x % m + m) % m to ensure positive result
Integer overflow in a*b % m	Even if a,b < m, the product a*b can overflow 64-bit int	Use 128-bit multiplication or Python (arbitrary precision)

Concept check: You need to compute 3^10¹⁸ mod 10⁹+7 in a coding contest. What approach works?

Compute 3^10¹⁸ as a big integer, then take mod Use Python's pow(3, 10**18, 10**9+7) which does repeated squaring with modular reduction at each step — about 60 multiplications total Use Fermat's little theorem to reduce the exponent first, then brute force This cannot be computed in reasonable time

Chapter 9: Connections

Number theory is a thread that runs through many seemingly unrelated parts of algorithms and computer science. Here is how this chapter connects to the rest of CLRS and beyond.

Where We Came From

Chapter	Connection
Ch 11: Hash Tables	The division method h(k) = k mod m is pure modular arithmetic. Universal hashing uses h(k) = ((ak+b) mod p) mod m where p is prime. The analysis of collision probability uses number-theoretic arguments about the distribution of residues.
Ch 30: FFT	The Fast Fourier Transform evaluates polynomials at the n-th roots of unity: e^2πik/n. The Number Theoretic Transform (NTT) replaces complex roots with primitive roots modulo a prime, enabling exact polynomial multiplication over integers.
Ch 34: NP-Completeness	Integer factoring is believed (but not proven) to be neither in P nor NP-complete — it sits in a suspected "middle ground." If factoring were in P, RSA would be broken. If factoring were NP-complete, then P = NP would break RSA (and much more).

What Builds on This

Elliptic curve cryptography generalizes modular arithmetic from integers to points on curves. All the same primitives — GCD, modular inverse, fast exponentiation — apply, but over a different algebraic structure.

Lattice-based cryptography (the post-quantum frontier) works with vectors in high-dimensional lattices. The hardness assumption is different (shortest vector problem instead of factoring), but the algorithmic tools overlap: modular arithmetic, polynomial multiplication (via NTT), and probabilistic arguments.

Coding theory uses finite fields (modular arithmetic over polynomials) for error correction. Reed-Solomon codes, BCH codes, and LDPC codes all build on the algebra from this chapter.

Computational geometry surprisingly uses number theory for robust predicates. When determining if a point is left or right of a line, floating-point arithmetic can give wrong answers. Exact integer arithmetic with modular bounds avoids this — and the key insight is that determinant computations can be done in Z_p for a sufficiently large prime p.

String algorithms use modular hashing extensively. The Rabin-Karp string matching algorithm computes h(s) = ∑ s[i] × bⁱ mod p as a rolling hash, enabling O(n) expected-time pattern matching. The collision analysis directly invokes the number-theoretic properties of primes.

Randomized algorithms from Chapter 5 of CLRS use number theory for hash function construction. Universal hash families, the probabilistic method, and derandomization techniques all lean on properties of primes and modular arithmetic.

Resource	Level	Focus
CLRS Chapter 31	Undergraduate	Algorithms perspective — GCD, modular exp, RSA, Miller-Rabin, CRT
Shoup, A Computational Introduction to Number Theory and Algebra	Graduate	Rigorous treatment with complexity analysis. Free online.
Katz & Lindell, Introduction to Modern Cryptography	Graduate	Security proofs, formal definitions, modern constructions
Crandall & Pomerance, Prime Numbers	Advanced	State-of-the-art algorithms for primality and factoring
Project Euler (projecteuler.net)	Practice	Hundreds of problems requiring number-theoretic algorithms

The Hierarchy of Hardness

Easy (P)

GCD, modular exp, primality testing (AKS is polynomial)

↓

Hard (believed not in P)

Factoring, discrete log — the basis of RSA and Diffie-Hellman

↓

NP-Complete

3SAT, TSP, graph coloring — factoring is probably NOT here

↓

Undecidable

Halting problem — no algorithm at all

The big picture. Cryptography lives in the gap between P and NP-complete. It requires problems that are hard enough that attackers cannot solve them, but structured enough that legitimate users can work with them efficiently. Number theory provides exactly this: multiplication is easy, factoring is hard. Exponentiation is easy, discrete log is hard. This gap is the entire foundation of digital security.

What We Covered

Algorithm	Input	Output	Complexity	Used In
Euclid's GCD	a, b	gcd(a,b)	O(log min(a,b))	RSA keygen, coprimality
Extended Euclid	a, b	gcd, x, y	O(log min(a,b))	Modular inverse, RSA keygen
Modular exponentiation	a, b, n	a^b mod n	O(log b)	RSA encrypt/decrypt, DH, primality
Miller-Rabin	n, k	prime/composite	O(k log² n)	RSA prime generation
CRT	congruences	solution x	O(k log M)	RSA speedup, parallel computation

The Quantum Threat in Detail

Shor's algorithm (1994) factors an integer n in O(log³ n) time on a quantum computer. The key idea: factoring reduces to finding the period of the function f(x) = a^x mod n, which quantum computers can do efficiently using the Quantum Fourier Transform. Once you know the period r of a^x mod n, you can (with high probability) compute gcd(a^r/2 ± 1, n) to find a factor.

Current quantum computers have about 1,000 noisy qubits. Breaking 2048-bit RSA requires roughly 4,000 logical (error-corrected) qubits, which translates to millions of physical qubits with current error rates. Estimates vary, but most experts expect this capability between 2030-2050. The cryptographic community is not waiting — NIST finalized the first post-quantum standards in 2024:

Algorithm	Type	Hardness Basis	Key Size
CRYSTALS-Kyber (ML-KEM)	Key exchange	Module lattices (MLWE)	~1.5 KB
CRYSTALS-Dilithium (ML-DSA)	Signature	Module lattices (MLWE)	~2.5 KB
SPHINCS+	Signature	Hash functions	~8-49 KB

These algorithms are based on lattice problems (finding short vectors in high-dimensional lattices), not factoring. Shor's algorithm does not help with lattice problems, and no quantum algorithm is known that solves them efficiently. The transition from RSA/DH to lattice-based cryptography is the biggest change in internet security infrastructure since the adoption of TLS.

Open Problems

Some of the deepest unsolved questions in mathematics connect directly to the algorithms in this chapter:

Is factoring in P? If yes, RSA is broken. Most experts believe the answer is no, but no one has proven it.
P vs NP. If P = NP, all of public-key cryptography collapses — factoring, discrete log, and every other trapdoor function would be efficiently solvable.
Riemann Hypothesis. The distribution of primes (which governs how quickly we find large primes for RSA) is controlled by the zeros of the Riemann zeta function. RH is the most famous open problem in mathematics.
Quantum computing. Shor's algorithm factors integers in polynomial time on a quantum computer. When (if) large-scale quantum computers arrive, RSA and DH are dead. The migration to post-quantum cryptography is already underway.

Final check: Which of the following statements is true about the relationship between number theory and computer science?

Factoring has been proven to be NP-complete Shor's algorithm can break lattice-based cryptography Primality testing is in P (proven by AKS in 2002), but factoring is believed to be in neither P nor NP-complete — sitting in a suspected middle ground that makes it perfect for cryptographic trapdoor functions All public-key cryptography will become impossible once quantum computers are built

"God made the integers; all else is the work of man." — Leopold Kronecker

"It is hard to think of any scientific discovery in the history of the world that has had a more immediate and dramatic impact on human society than public key cryptography." — Simon Singh, The Code Book

Number-Theoretic Algorithms