EE269 Lecture 6 — Mert Pilanci, Stanford

The Discrete Fourier
Transform

Changing basis to reveal hidden structure — from shifted deltas to complex exponentials, and O(N log N) computation.

Prerequisites: Complex numbers (e = cosθ + j sinθ) + Matrix-vector multiplication. That's it.
8
Chapters
7+
Simulations
0
Assumed Knowledge

Chapter 0: Why Change Basis

You have a signal: 1024 audio samples, or 256 pixel values in a row of an image, or 512 sensor readings. In the time domain (or spatial domain), this is just a list of numbers. Each number tells you "what happened at this instant." Useful, but limited.

Here's the problem: in the time domain, many signals that look completely different share deep structure. A pure 440 Hz sine wave and a 440 Hz sine wave shifted by half a period look totally different sample-by-sample — one starts positive, the other negative. But they're "the same note." The structure (single frequency, 440 Hz) is invisible in the time representation.

The fix: represent the signal in a different basis. Instead of "how much of each shifted delta is in this signal," ask "how much of each frequency is in this signal." That's the Fourier basis.

Why change basis? Four reasons:
Compress: Most energy concentrates in few coefficients (JPEG, MP3)
Classify: Frequency content identifies signals (speech vs. music, A vs. B)
Separate: Filter out noise in frequency domain, reconstruct clean signal
Accelerate: Convolution in time = pointwise multiplication in frequency (O(N log N) vs. O(N2))

The Standard Basis

The standard basis for N-dimensional signal space is the set of shifted deltas:

e0 = [1, 0, 0, ..., 0]
e1 = [0, 1, 0, ..., 0]

eN−1 = [0, 0, ..., 0, 1]

Any signal x can be written as x = x[0]·e0 + x[1]·e1 + ... + x[N−1]·eN−1. The coefficients are just the signal samples themselves. This representation is trivial — it tells us nothing new.

The Fourier transform says: represent x in a different basis, where the basis vectors are complex exponentials (sinusoids). The coefficients in this new basis are the frequency components.

Same Signal, Two Representations

A signal made of two sinusoids. In the time domain, it's hard to see the components. In the frequency domain, they're obvious: two spikes.

Frequency 1 3
Frequency 2 7
Which of these is NOT a reason to change from the time-domain basis to the Fourier basis?

Chapter 1: Orthonormal Bases

Before we can define the Fourier basis, we need to be precise about what makes a "good" basis. The answer: orthonormality.

Inner Product for Complex Vectors

For two N-dimensional complex vectors u, v, the inner product is:

⟨u, v⟩ = ∑n=0N−1 u[n] · v[n]*

where v[n]* is the complex conjugate. The conjugate is essential for complex vectors — without it, ⟨v, v⟩ wouldn't give a real positive number.

Linear Independence

A set of vectors {v0, ..., vM−1} is linearly independent if no vector can be written as a combination of the others. Equivalently: the only solution to c0v0 + c1v1 + ... = 0 is all ck = 0.

If we have N linearly independent vectors in N-dimensional space, they form a basis — any vector can be uniquely decomposed into this basis.

Orthogonality and Orthonormality

A set of vectors is orthogonal if every pair has zero inner product:

⟨vk, vr⟩ = 0   for all   k ≠ r

It's orthonormal if additionally each vector has unit norm:

⟨vk, vr⟩ = δ[k − r] = { 1 if k=r, 0 otherwise }
Why orthonormality matters: If the basis is orthonormal, finding the coefficients is trivial. The coefficient for basis vector vk is just ⟨x, vk⟩ — a single inner product. No matrix inversion, no solving linear systems. Just project.

The Expansion Formula

Given an orthonormal basis {v0, ..., vN−1}, any vector x can be written:

x = ∑k=0N−1 ck · vk    where   ck = ⟨x, vk

This is analysis (finding coefficients) and synthesis (reconstructing from coefficients). The DFT is exactly this, with the Fourier vectors as the basis.

Why This Is Revolutionary

Before orthonormal bases, decomposing a signal required solving a system of N equations in N unknowns — O(N3) computation for Gaussian elimination. With an orthonormal basis, each coefficient is computed independently via a single inner product — O(N) per coefficient, O(N2) total for all N coefficients, or O(N log N) with the FFT.

Moreover, truncating the expansion (keeping only the largest coefficients) gives the best approximation in the least-squares sense. This is the foundation of transform coding: DFT/DCT the signal, keep the big coefficients, throw away the small ones. The orthonormality guarantees this is optimal.

Geometric Intuition

Think of the standard basis e0, e1 in 2D. They're orthonormal: perpendicular, unit length. To find the x-component of a point, you project onto e0. To find the y-component, project onto e1. The projections are inner products.

The Fourier basis does the same thing in N dimensions, but the basis vectors are complex exponentials (spinning phasors) instead of coordinate axes. Projecting onto each frequency basis vector gives you "how much of that frequency is present."

Why Orthogonality Is Non-Negotiable

Without orthogonality, decomposition becomes computationally expensive and numerically unstable. Consider a non-orthogonal basis {v0, v1}. To find the coefficients c0, c1 such that x = c0v0 + c1v1, you must solve a linear system — invert the Gram matrix G = [⟨vi, vj⟩]. For N dimensions, that's O(N3) for a general basis, vs. O(N) for an orthonormal one (just compute N inner products).

Moreover, with a non-orthogonal basis, small errors in the coefficients can cause large errors in the reconstruction (the condition number of G may be huge). Orthonormal bases have condition number exactly 1 — perfect numerical stability.

Parseval's Identity (Energy Preservation)

A crucial consequence of orthonormality: the "length" (energy) of a vector is the same whether measured in the original coordinates or the expansion coefficients:

||x||2 = ∑|x[n]|2 = ∑|ck|2

No energy is created or destroyed by changing to an orthonormal basis. This guarantees that quantizing in the transform domain causes the same total distortion as quantizing in the time domain (up to how coefficients are allocated). This is the mathematical foundation of transform coding (JPEG, MP3).

Orthogonal Projection in 2D

Drag the vector x. Its projections onto the two orthogonal basis vectors (orange, teal) are the coordinates.

Drag the blue dot to change x. Projections update automatically.
In an orthonormal basis, how do you find the coefficient ck for basis vector vk?

Chapter 2: The Fourier Basis

Now the main event. We define N special vectors that form an orthonormal basis for CN. These are the Fourier basis vectors (also called DFT basis vectors or frequency-domain basis vectors).

Definition

For m = 0, 1, ..., N−1, the m-th Fourier basis vector is:

wm[n] = (1/√N) · ej2πmn/N    for   n = 0, 1, ..., N−1

Each wm is a sampled complex exponential at frequency m cycles per N samples, normalized by 1/√N to have unit norm.

Expanding e = cosθ + j sinθ:

wm[n] = (1/√N)(cos(2πmn/N) + j sin(2πmn/N))

So wm is a complex sinusoid that completes exactly m full cycles over N samples.

The Twiddle Factor

Define WN = e−j2π/N (the "N-th root of unity"). Then all DFT computations use powers of this single complex number:

WNkn = e−j2πkn/N

For N = 8: W8 = e−jπ/4 = cos(45°) − j sin(45°) = (√2/2)(1 − j). All 64 entries of the 8×8 DFT matrix are powers of this single number. The FFT exploits the algebraic structure of these powers (periodicity, symmetry) to avoid redundant computation.

Key identities:

• WNN = 1 (periodicity)

• WNN/2 = −1 (half-period gives negation)

• WNk+N = WNk (periodicity in k)

• W2N2k = WNk (decimation property — the FFT uses this)

Proving Orthogonality

We need to show ⟨wk, wr⟩ = δ[k−r]. Compute:

⟨wk, wr⟩ = ∑n=0N−1 wk[n] · wr[n]* = (1/N) ∑n=0N−1 ej2πkn/N · e−j2πrn/N
= (1/N) ∑n=0N−1 ej2π(k−r)n/N

Let ω = ej2π(k−r)/N. Then the sum becomes (1/N) ∑n=0N−1 ωn.

Case 1: k = r. Then ω = e0 = 1. Sum = (1/N) · N = 1. Correct: the norm is 1.

Case 2: k ≠ r. Then ω ≠ 1, and we use the geometric series formula:

n=0N−1 ωn = (1 − ωN) / (1 − ω)

Now ωN = ej2π(k−r) = 1 (since k−r is an integer). So the numerator is 1 − 1 = 0. The sum is zero.

Orthogonality proven: ⟨wk, wr⟩ = δ[k−r]. The Fourier vectors are orthonormal. The geometric series formula + the periodicity of complex exponentials does all the work.

Visualizing the Basis Vectors

For N = 8, the basis vectors are 8 complex exponentials:

• w0: constant (DC) — all samples equal 1/√8

• w1: one full cycle over 8 samples

• w2: two full cycles over 8 samples

• w3: three full cycles...

• w4: four full cycles (Nyquist frequency — alternating ±1)

Fourier Basis Vectors (N=8)

Select a frequency index m to see the real and imaginary parts of wm. Note how m cycles fit in N samples.

Frequency m 1
In the orthogonality proof, why does ωN = 1 when k ≠ r?

Chapter 3: DFT & IDFT Formulas

Now we apply the orthonormal basis machinery. "Analysis" (find coefficients) gives us the DFT. "Synthesis" (reconstruct from coefficients) gives us the IDFT.

The DFT (Analysis)

Given a signal x[n] of length N, its DFT coefficients are:

X[k] = ∑n=0N−1 x[n] · e−j2πkn/N    for k = 0, 1, ..., N−1

Each X[k] is the inner product of x with the k-th Fourier basis vector (without the 1/√N normalization — by convention, the DFT absorbs both 1/√N factors into the IDFT as a single 1/N).

X[k] is complex. Its magnitude |X[k]| tells you the amplitude of frequency k. Its phase ∠X[k] tells you the phase offset of frequency k.

What Each DFT Bin Represents

Bin k = 0 is the DC component: X[0] = ∑ x[n] = N × (mean of x). It measures the average value of the signal.

Bin k = 1 represents one full cycle over N samples, i.e., frequency f1 = fs/N Hz. For N = 1024 and fs = 44100 Hz, this is 43.07 Hz.

Bin k represents frequency fk = k · fs/N Hz. The frequency resolution (spacing between bins) is Δf = fs/N. To resolve two frequencies separated by Δf Hz, you need at least N = fs/Δf samples.

Bins k = N/2 + 1 to N − 1 are the "negative frequencies" (or equivalently, frequencies above Nyquist). For real signals, they're redundant: X[N−k] = X[k]*. You only use bins 0 to N/2.

Frequency-to-bin mapping: Given a signal at frequency f Hz, it appears at bin k = f · N / fs. Given a bin index k, it represents frequency f = k · fs / N Hz. This conversion is essential for interpreting DFT output.

The IDFT (Synthesis)

To reconstruct x from its DFT coefficients:

x[n] = (1/N) ∑k=0N−1 X[k] · ej2πkn/N    for n = 0, 1, ..., N−1

This says: the signal is the sum of N complex exponentials, each weighted by its DFT coefficient X[k], normalized by 1/N.

Interpretation: The DFT decomposes any signal into a sum of sinusoids at frequencies 0, 1, 2, ..., N−1 cycles per N samples. The coefficients X[k] are the complex amplitudes. The IDFT adds them back up to recover the original signal perfectly — no information is lost.

The DFT Matrix

Both operations can be written as matrix-vector products. Define the DFT matrix F of size N×N:

F[k, n] = e−j2πkn/N

Then:

X = F · x    (DFT: analysis)
x = (1/N) · FH · X    (IDFT: synthesis)

where FH is the conjugate transpose (Hermitian) of F. Since F is N√N times a unitary matrix: FHF = N·I. Equivalently, (1/√N)F is unitary.

Worked Example: N = 4

Let W = e−j2π/4 = e−jπ/2 = −j. The DFT matrix for N=4 is:

F4 = [1, 1, 1, 1; 1, W, W2, W3; 1, W2, W4, W6; 1, W3, W6, W9]

Substituting W = −j, W2 = −1, W3 = j:

F4 = [1, 1, 1, 1; 1, −j, −1, j; 1, −1, 1, −1; 1, j, −1, −j]

You can verify: every row is orthogonal to every other row (inner product = 0). Each row has norm √4 = 2.

Verification: Row 0 vs. Row 1

Row 0 = [1, 1, 1, 1]. Row 1 = [1, −j, −1, j]. Inner product:

⟨row0, row1⟩ = 1·1* + 1·(−j)* + 1·(−1)* + 1·(j)* = 1 + j − 1 − j = 0 ✓

Row 0 vs. Row 2: ⟨[1,1,1,1], [1,−1,1,−1]⟩ = 1 − 1 + 1 − 1 = 0 ✓

Row 1 norm: |1|2 + |−j|2 + |−1|2 + |j|2 = 1 + 1 + 1 + 1 = 4 = N. So (1/√N)·row has unit norm.

DFT Matrix Visualization (N=8)

The DFT matrix: each row k is a complex exponential at frequency k. Color = phase, brightness = magnitude (all magnitudes are 1).

Each cell shows e-j2πkn/N. The pattern is the discrete analog of the continuous Fourier kernel.
For N = 4 with W = e-jπ/2, what is F4[2,1] (row 2, column 1)?

Chapter 4: DFT Properties

The DFT has elegant algebraic properties that make it far more than a decomposition tool. These properties are why Fourier methods dominate signal processing.

Linearity

DFT{a·x + b·y} = a·DFT{x} + b·DFT{y}

The DFT of a sum is the sum of DFTs. Obvious from linearity of summation, but powerful: analyze complex signals by analyzing their components separately.

Time Shift (Circular)

If x[n] has DFT X[k], then the circularly shifted signal x[(n−m) mod N] has DFT:

DFT{x[(n−m) mod N]} = X[k] · e−j2πkm/N

A time delay of m samples multiplies each frequency coefficient by a linear phase factor. The magnitude spectrum is unchanged — only the phase shifts. This is why |X[k]| represents the "frequency content" regardless of where the signal starts.

Frequency Shift (Modulation)

Multiplying the signal by a complex exponential shifts its spectrum:

DFT{x[n] · ej2πm0n/N} = X[(k − m0) mod N]

This is the dual of the time-shift property. Modulation in time = shift in frequency.

Parseval's Theorem (Energy Preservation)

n=0N−1 |x[n]|2 = (1/N) ∑k=0N−1 |X[k]|2

Total energy in time = total energy in frequency (up to the 1/N factor from our normalization convention). The DFT is an energy-preserving transformation — it rotates the vector, never stretches or shrinks it.

Convolution Theorem

DFT{x ⊛ y} = X · Y   (pointwise)

where ⊛ denotes circular convolution. Convolution in time becomes multiplication in frequency. This is the single most important DFT property for computation:

Direct convolution: O(N2) multiplications

Via FFT: DFT both (O(N log N)), multiply pointwise (O(N)), IDFT (O(N log N)). Total: O(N log N)

The convolution theorem enables: Fast filtering, fast correlation, fast polynomial multiplication, fast number-theoretic transforms. Every time you apply a filter (audio EQ, image blur, neural network conv layer), you're using this property.

The Convolution Theorem in Practice

Suppose you want to apply a 1000-tap FIR filter h[n] to a signal x[n] of length 100,000:

Direct convolution: 1000 × 100,000 = 108 multiplies

FFT method: Zero-pad both to N = 131,072 (next power of 2 ≥ 100,000 + 999). Three FFTs of size N: 3 × N·log2(N) ≈ 3 × 131,072 × 17 ≈ 6.7 × 106 operations. Plus N pointwise multiplies. Total ≈ 7 × 106.

Speedup: 14×. For longer filters, the speedup is even larger.

In audio, convolution reverb (applying a room impulse response of 1–3 seconds = 44,100–132,300 taps) is ONLY feasible via FFT. Direct convolution would require billions of operations per second.

Duality

The DFT properties exhibit a beautiful duality: whatever holds for the time domain also holds (with a sign flip or N factor) for the frequency domain:

• Convolution in time ↔ pointwise multiplication in frequency

• Pointwise multiplication in time ↔ circular convolution in frequency (divided by N)

• Shift in time ↔ phase ramp in frequency

• Phase ramp in time ↔ shift in frequency

This duality comes from the symmetric structure of the DFT and IDFT formulas (they differ only by the sign in the exponent and the 1/N factor).

PropertyTime DomainFrequency Domain
Linearityax + byaX + bY
Time shiftx[(n-m) mod N]X[k]·e-j2πkm/N
Freq shiftx[n]·ej2πm0n/NX[(k-m0) mod N]
Convolutionx ⊛ yX · Y (pointwise)
Parseval∑|x[n]|2(1/N)∑|X[k]|2
Convolution Theorem Demo

Two short signals are convolved. Compare the direct computation (blue) with the FFT approach (orange overlay). They match exactly.

Circular convolution in the time domain corresponds to what operation in the frequency domain?

Chapter 5: The FFT Algorithm

The DFT formula requires N multiplications for each of N coefficients: O(N2) total. For N = 106 (one second of audio at 1 MHz), that's 1012 operations. Impractical.

The Fast Fourier Transform (FFT) computes the same result in O(N log N). For N = 106, that's 2×107 — a speedup of 50,000×. The FFT is one of the most important algorithms in computational science.

The Cooley-Tukey Idea (1965)

Start with the DFT of an N-point signal (N = 2m, a power of 2):

X[k] = ∑n=0N−1 x[n] · WNkn   where   WN = e−j2π/N

Split the sum into even-indexed and odd-indexed terms:

X[k] = ∑r=0N/2−1 x[2r] · WNk·2r + ∑r=0N/2−1 x[2r+1] · WNk(2r+1)

Note that WN2 = e−j2π·2/N = e−j2π/(N/2) = WN/2. So:

X[k] = ∑r=0N/2−1 x[2r] · WN/2kr + WNkr=0N/2−1 x[2r+1] · WN/2kr

Define A[k] = DFT of even samples, B[k] = DFT of odd samples (both length N/2):

X[k] = A[k] + WNk · B[k]    for k = 0, ..., N/2−1
X[k + N/2] = A[k] − WNk · B[k]    (using WNk+N/2 = −WNk)
The butterfly operation: Each pair (X[k], X[k+N/2]) is computed from (A[k], B[k]) using one multiplication by WNk (the "twiddle factor") and one addition/subtraction. This is the butterfly — the fundamental building block of the FFT.

Recursion and Complexity

One N-point DFT becomes two N/2-point DFTs plus N/2 butterflies. Apply recursively:

T(N) = 2T(N/2) + O(N) → T(N) = O(N log2 N)

For N = 8: direct DFT = 64 multiplies. FFT = 8·log2(8) = 24 multiplies. For N = 1024: direct = 1,048,576. FFT = 10,240. Ratio: 102×.

The Butterfly Diagram

The FFT can be drawn as a network of butterflies, organized in log2(N) stages. Each stage has N/2 butterflies. The input order is bit-reversed.

FFT Butterfly Diagram (N=8)

Step through the FFT computation stage by stage. Watch how 8 inputs are combined through 3 stages of butterflies to produce 8 frequency coefficients.

Stage 0/3. Input: bit-reversed order. Press Next to advance.

Why Does the Split Work?

The key algebraic identity is WNk+N/2 = −WNk. This is because:

WNN/2 = e−j2π(N/2)/N = e−jπ = −1

So X[k + N/2] reuses the SAME products A[k] and WNk·B[k] computed for X[k], just with a sign flip. This "buy one, get one free" is the source of the O(N log N) savings — we compute N/2 multiplications per stage, and there are log2(N) stages.

Memory and In-Place Computation

The FFT can be computed in-place: using only O(N) memory (the input array itself). At each stage, each butterfly reads two values and writes two values back to the same locations. This is why the FFT is so practical — no auxiliary storage beyond the input buffer.

Bit-Reversal

The input to the FFT butterfly network is in bit-reversed order. For N = 8 (3 bits):

IndexBinaryReversedInput position
0000000x[0]
1001100x[4]
2010010x[2]
3011110x[6]
4100001x[1]
5101101x[5]
6110011x[3]
7111111x[7]

This reordering groups even-indexed and odd-indexed elements recursively, which is exactly what the divide-and-conquer decomposition needs.

FFT Variants

The radix-2 Cooley-Tukey FFT requires N to be a power of 2. Extensions exist:

Radix-4: Split into 4 sub-problems. Slightly fewer multiplies than radix-2 (saves ~25%).

Split-radix: Combines radix-2 and radix-4 optimally. The best known algorithm for power-of-2 sizes.

Mixed-radix: Handles any N = n1·n2·...·nk. FFTW library uses this.

Bluestein (chirp-z): Reduces arbitrary-size DFT to a convolution, then uses FFT. Works for ANY N.

In practice, libraries like FFTW ("Fastest Fourier Transform in the West") choose the optimal algorithm automatically based on N and hardware. You just call `fft(x)` and get O(N log N) performance regardless of N.

Computation Cost Table

NDirect DFT (N2)FFT (N log2 N)Speedup
864242.7×
644,09638410.7×
1,0241,048,57610,240102×
65,5364.3×1091.05×1064,096×
1,048,5761.1×10122.1×10752,429×

At audio sample rates (44.1 kHz), processing 1 second requires N = 44,100. The FFT makes real-time spectral analysis possible on even modest hardware.

Implementation in Python

python
import numpy as np

# Direct DFT (O(N^2) — for understanding only)
def dft_direct(x):
    N = len(x)
    X = np.zeros(N, dtype=complex)
    for k in range(N):
        for n in range(N):
            X[k] += x[n] * np.exp(-1j * 2 * np.pi * k * n / N)
    return X

# FFT via numpy (O(N log N) — use this)
X = np.fft.fft(x)          # DFT
x_back = np.fft.ifft(X)    # IDFT
mags = np.abs(X)            # magnitude spectrum
phases = np.angle(X)        # phase spectrum
The FFT computes an N-point DFT in O(?) operations:

Chapter 6: DFT Examples

Let's compute DFTs by hand for small N. This builds intuition for what the DFT actually does to simple signals.

Example 1: x = [1, 1, 1, 1] (N=4, constant signal)

X[k] = ∑n=03 1 · e−j2πkn/4 = ∑n=03 Wkn

X[0] = 1 + 1 + 1 + 1 = 4

X[1] = 1 + W + W2 + W3 = 1 + (−j) + (−1) + j = 0

X[2] = 1 + W2 + W4 + W6 = 1 + (−1) + 1 + (−1) = 0

X[3] = 1 + W3 + W6 + W9 = 1 + j + (−1) + (−j) = 0

Result: X = [4, 0, 0, 0]. A constant signal has all its energy at DC (frequency 0). Makes sense — no oscillation means no non-zero frequency components.

Example 2: x = [1, −1, 1, −1] (alternating)

X[0] = 1 + (−1) + 1 + (−1) = 0

X[1] = 1 + (−1)(−j) + 1(−1) + (−1)(j) = 1 + j − 1 − j = 0

X[2] = 1 + (−1)(−1) + 1(1) + (−1)(−1) = 1 + 1 + 1 + 1 = 4

X[3] = 1 + (−1)(j) + 1(−1) + (−1)(−j) = 1 − j − 1 + j = 0

Result: X = [0, 0, 4, 0]. All energy at k=2, the Nyquist frequency (N/2 = 2 cycles per N samples = maximum oscillation rate). An alternating signal IS the Nyquist frequency.

Example 3: x = [1, 0, −1, 0] (cosine at frequency 1)

This is cos(2πn/4) = cos(πn/2) = [1, 0, −1, 0].

X[0] = 1 + 0 + (−1) + 0 = 0

X[1] = 1 + 0·(−j) + (−1)(−1) + 0·j = 1 + 1 = 2

X[2] = 1 + 0·(−1) + (−1)·1 + 0·(−1) = 0

X[3] = 1 + 0·j + (−1)(−1) + 0·(−j) = 1 + 1 = 2

Result: X = [0, 2, 0, 2]. Energy at k=1 and k=3 (= N−1). This is the "mirror symmetry" for real signals: X[N−k] = X[k]* (conjugate). Since cos = (e + e−jθ)/2, it has both positive and negative frequency components.

Pattern for real signals: If x[n] is real, then X[N−k] = X[k]*. The spectrum is conjugate-symmetric around N/2. You only need to store the first N/2+1 coefficients.

Example 4: x = [1, 2, 3, 4] (a ramp)

X[0] = 1 + 2 + 3 + 4 = 10 (the sum — DC value)

X[1] = 1 + 2(−j) + 3(−1) + 4(j) = 1 − 2j − 3 + 4j = −2 + 2j

X[2] = 1 + 2(−1) + 3(1) + 4(−1) = 1 − 2 + 3 − 4 = −2

X[3] = 1 + 2(j) + 3(−1) + 4(−j) = 1 + 2j − 3 − 4j = −2 − 2j

Check conjugate symmetry: X[3] = X[4−1] should equal X[1]* = (−2 + 2j)* = −2 − 2j. Confirmed!

Magnitudes: |X| = [10, 2√2, 2, 2√2] ≈ [10, 2.83, 2, 2.83]. The DC component dominates because the ramp has a large mean value (2.5). The non-zero frequency components reflect the linear trend.

Parseval Check

Time-domain energy: 12 + 22 + 32 + 42 = 30.

Frequency-domain: (1/4)(102 + 8 + 4 + 8) = (1/4)(120) = 30. Checks out!

4-Point DFT Calculator

Enter 4 real values and see the DFT computed step by step.

x[0]1
x[1]0
x[2]-1
x[3]0
What is the DFT of x = [1, 1, 1, 1] (N=4)?

Chapter 7: Mastery

The DFT is the workhorse of digital signal processing. Let's consolidate the key ideas and connect forward.

Summary

ConceptFormulaComplexity
DFT (analysis)X[k] = ∑ x[n]e-j2πkn/NO(N2) direct
IDFT (synthesis)x[n] = (1/N)∑ X[k]ej2πkn/NO(N2) direct
FFTCooley-Tukey divide & conquerO(N log N)
Orthogonality∑ ej2π(k-r)n/N = Nδ[k-r]

Connections

Lecture 7 (Spectral Descriptors): We'll use DFT magnitudes to extract audio features like spectral centroid, spread, and entropy.

Lecture 8+ (Compression): The DFT (and its cousin the DCT) enable transform coding — JPEG, MP3, video codecs. Energy compaction in the frequency domain allows aggressive quantization of "unimportant" coefficients.

ML Connection: Self-attention in transformers can be viewed as computing pairwise correlations — related to Fourier-domain filtering. FNet (2021) replaced attention with FFT and achieved 92% of BERT's accuracy with 7× faster training.

Historical Perspective

The continuous Fourier Transform was discovered by Joseph Fourier in 1807 (to solve the heat equation). The DFT was known to Gauss in 1805, but remained computationally impractical until Cooley and Tukey published their FFT algorithm in 1965. Within a decade:

1966: First real-time spectrum analyzers using FFT

1969: FFT used in X-ray crystallography (determining protein structures)

1970s: FFT enables digital audio processing (synthesizers, effects)

1980s: FFT in image compression (precursor to JPEG)

1990s: MP3 compression, OFDM in wireless communications

2000s+: 4G/5G OFDM, WiFi, convolution in neural networks

Today, FFT operations are so fundamental that many processors have dedicated FFT hardware (DSP chips, neural accelerators with FFT units).

Key Limitations

Circular (not linear) convolution: The DFT computes circular convolution. For linear convolution, zero-pad both signals to length ≥ Nx + Nh − 1.

Spectral leakage: A finite-length signal appears to have energy at all frequencies (the DFT of a windowed sinusoid isn't a perfect delta). Windowing (Hann, Hamming) reduces leakage at the cost of frequency resolution.

Assumes periodicity: The DFT treats the signal as periodic with period N. Discontinuities at the boundaries (first sample ≠ last sample) create leakage. Windowing mitigates this.

Fixed resolution: Frequency bins are spaced fs/N Hz apart. To resolve two close frequencies, you need a long window (large N). This trades time resolution for frequency resolution — the uncertainty principle.

DFT in Machine Learning

The DFT appears surprisingly often in modern ML:

FNet (2021): Replaced attention in transformers with a 2D DFT. Achieved 92% of BERT's accuracy with 7× faster training. The DFT captures token mixing without learned parameters.

Spectral normalization: Used in GAN training to constrain the spectral norm (largest singular value) of weight matrices. The FFT speeds up certain norm computations.

Audio models: Whisper, WaveNet, and all speech models use STFT (Short-Time Fourier Transform) as the input representation. The model sees spectrograms, not raw waveforms.

Efficient convolutions: For very long sequences (>4096 tokens), some architectures replace attention with FFT-based convolution (e.g., Hyena, S4). This gives O(N log N) scaling vs. O(N2) for attention.

Positional encoding: The sinusoidal positional encoding in the original Transformer (Vaswani et al., 2017) uses the same complex exponential frequencies as the DFT basis. Each position gets a unique "frequency fingerprint."

The deeper connection: Attention is a learned, adaptive Fourier transform. The DFT uses fixed basis vectors (complex exponentials). Attention uses input-dependent basis vectors (queries/keys). Both decompose a sequence into components and remix them. This is why FFT can approximate attention for many tasks.

The Short-Time Fourier Transform (STFT)

The DFT of a full signal gives frequency content but loses ALL time information. Which frequencies were present when? The Short-Time Fourier Transform solves this by computing the DFT on short overlapping frames:

STFT{x}(t, f) = ∑n x[n] · w[n − t] · e−j2πfn/N

where w[n] is a window function (Hann, Hamming) centered at time t. The result is a 2D time-frequency representation called a spectrogram. This is the input to most modern audio ML models (Whisper, AudioLM, MusicGen).

Frame parameters:

• Frame length: 25 ms for speech, 23 ms for music (1024 samples at 44.1 kHz)

• Hop size: 10 ms (256 samples) — 75% overlap between consecutive frames

• FFT size: often zero-padded to 2048 or 4096 for finer frequency resolution

• Window: Hann is standard (good sidelobe suppression)

The spectrogram is the bridge between the DFT (this lecture) and spectral descriptors (next lecture). Each column of the spectrogram is one DFT frame; spectral descriptors summarize each column into a few numbers.

DFT vs. DCT vs. Wavelets

The DFT is one of many orthonormal transforms. How does it compare?

TransformBasis FunctionsBest ForUsed In
DFTComplex exponentialsStationary signals, convolutionAudio processing, communications
DCTCosines only (real-valued)Natural images, speechJPEG, MPEG, AAC
DWTScaled/shifted waveletsNon-stationary signals, edgesJPEG2000, denoising
KLTData-dependent (eigenvectors)Optimal for specific statisticsPCA, theoretical bound

The DCT is preferred over DFT for compression because: (1) it's real-valued (no phase to store), (2) it handles boundary conditions better (implicit even symmetry avoids edge discontinuities), and (3) it achieves energy compaction close to the optimal KLT for most natural signals. The DFT remains preferred for filtering and convolution (via the convolution theorem).

Pilanci's perspective: "The DFT is not just a signal processing tool. It's a change-of-basis that reveals structure. In machine learning, we're always looking for the right representation — the basis in which the data becomes simple. The Fourier basis is optimal for stationary signals. Learned representations (embeddings, features) play the same role for non-stationary data."
Why does the FFT achieve O(N log N) instead of O(N2)?
Final thought: "The Fast Fourier Transform is the most important numerical algorithm of our lifetime." — Gilbert Strang, MIT. Cooley and Tukey published their paper in 1965. Within a year, it had been adopted across every branch of engineering and science. It remains the foundation of every frequency-domain computation today.