EE269 Lecture 10 — Wavelet Transforms

Chapter 0: Why Wavelets

In Lecture 8, we built the STFT — a powerful tool that gives us time-frequency analysis. But it has one fundamental limitation: the window size is fixed.

Consider music. A bass drum hit lasts ~100 ms and has frequency ~60 Hz. A hi-hat click lasts ~5 ms and has frequency ~8000 Hz. The STFT forces you to choose ONE window size for both:

• Long window (good for bass drum's frequency): smears the hi-hat's timing

• Short window (good for hi-hat's timing): can't resolve the bass drum's pitch

What we really want: long windows at low frequencies (to resolve pitch) and short windows at high frequencies (to resolve timing). This is exactly what wavelets provide.

The STFT's fixed window is a design choice, not a law of nature. The uncertainty principle says Δt · Δf ≥ 1/(4π) — the AREA of each time-frequency tile is fixed. But nothing says the tiles must be the same shape. Wavelets use tall-narrow tiles at low frequencies and short-wide tiles at high frequencies. Same area, different aspect ratio.

Natural Signals Have Multi-Scale Structure

Nature itself operates at multiple scales:

• Speech: Pitch (100-300 Hz) changes slowly; fricatives (2000-8000 Hz) are brief bursts

• Music: Bass notes sustain for seconds; percussion is milliseconds

• Earthquakes: P-waves (high freq) arrive as sharp spikes; surface waves (low freq) are long undulations

• ECG: QRS complex is a quick spike (~100 ms); T-wave is slow and broad (~300 ms)

In all these cases, the "interesting" features at high frequency are SHORT, and the "interesting" features at low frequency are LONG. Wavelets match this natural structure.

STFT vs. Wavelet Tiling

Left: STFT uses uniform tiles. Right: Wavelet uses adaptive tiles. Both cover the same time-frequency plane with the same total area per tile.

A Brief History

Wavelets emerged from multiple fields simultaneously in the 1980s:

• Jean Morlet (1982): Geophysicist analyzing seismic signals, needed multi-scale analysis

• Alex Grossmann (1984): Physicist who formalized the continuous wavelet transform

• Stéphane Mallat (1989): Connected wavelets to filter banks and multi-resolution analysis

• Ingrid Daubechies (1988): Constructed compactly-supported orthogonal wavelets — the "db" family

What is the fundamental limitation of the STFT that wavelets solve?

The STFT can't handle complex signals The STFT uses a fixed window size, giving the same time-frequency tradeoff at ALL frequencies, when we need adaptive resolution The STFT is too slow to compute

Chapter 1: Mother Wavelet & Scaling

The STFT uses sinusoids (e^jωt) modulated by a window. Wavelets use a single prototype function — the mother wavelet ψ(t) — that gets scaled and shifted to analyze the signal at different frequencies and times.

The Mother Wavelet ψ(t)

A mother wavelet is any function ψ(t) that satisfies two conditions:

1. Zero mean: ∫ ψ(t) dt = 0 (it oscillates — has positive and negative parts)

2. Finite energy: ∫ |ψ(t)|² dt < ∞ (it's localized — decays to zero)

Why "wavelet"? The name means "small wave." Unlike a sinusoid (which oscillates forever), a wavelet is a brief oscillation that dies out. It's localized in BOTH time and frequency — the ideal time-frequency atom.

Scaling: Stretching and Compressing

From the single mother wavelet, we create a family of wavelets at different scales s:

ψ_s(t) = (1/√s) · ψ(t/s)

What does scaling do?

• s > 1 (stretch): Wavelet is wider in time → captures lower frequencies

• s < 1 (compress): Wavelet is narrower → captures higher frequencies

• s = 1: The mother wavelet itself

The factor 1/√s preserves energy: ||ψ_s||² = ||ψ||² for all s.

Shifting: Moving in Time

To analyze different time positions, we translate:

ψ_s,τ(t) = (1/√s) · ψ((t - τ)/s)

This is the complete wavelet family: parametrized by scale s (which frequency) and translation τ (which time).

Scale ↔ Frequency Relationship

If the mother wavelet ψ(t) has center frequency f₀, then the scaled wavelet ψ_s(t) has center frequency:

f = f₀ / s

Large scale = low frequency. Small scale = high frequency. This is the opposite convention from what you might expect (scale is inversely proportional to frequency).

Worked Example: Scaling a Wavelet

Mother wavelet: Mexican hat with center frequency f₀ = 1 Hz and effective duration 1 second.

At scale s = 2: Wavelet stretches to 2 seconds duration, center freq = 0.5 Hz. It's slower and more spread out — suitable for detecting low-frequency features.

At scale s = 0.25: Wavelet compresses to 0.25 seconds, center freq = 4 Hz. It's fast and localized — suitable for detecting sharp transients at 4 Hz.

The energy normalization 1/√s ensures that:

• At s=2: amplitude is 1/√2 ≈ 0.707 (wider but lower)

• At s=0.25: amplitude is 1/√0.25 = 2 (narrower but taller)

• Total area under |ψ_s|² is the same for all s

Example: Mexican Hat Wavelet

The Mexican hat wavelet (Ricker wavelet) is the negative second derivative of a Gaussian:

ψ(t) = (2/√3) π^-1/4 (1 - t²) e^-t²/2

It looks like a sombrero: one positive bump flanked by two negative bumps. Zero mean (positive area = negative area). Localized (Gaussian envelope decays quickly).

Other Common Mother Wavelets

Morlet wavelet: A Gaussian-windowed complex sinusoid: ψ(t) = e^jω₀t e^-t²/2. Combines the optimal time-frequency localization of the Gaussian with explicit frequency (via ω₀). Used extensively in geophysics and neuroscience (EEG analysis).

Haar wavelet: The simplest: +1 on the left half, -1 on the right half, 0 elsewhere. Discontinuous but perfectly localized in time. We'll study it in detail in Chapter 5.

Meyer wavelet: Infinitely smooth, compactly supported in frequency (but not in time). Theoretically beautiful but rarely used in practice due to infinite time support.

Mother Wavelet at Different Scales

The Mexican Hat wavelet stretched (low freq) and compressed (high freq). All have the same energy but different time/frequency spreads.

Scale s 1.0

A mother wavelet has center frequency f₀ = 1000 Hz. At scale s = 4, what frequency does it analyze?

4000 Hz 250 Hz (f₀/s = 1000/4) 1000 Hz (scale doesn't affect frequency)

Chapter 2: CWT Definition

Now we have our family of wavelets ψ_s,τ(t). The Continuous Wavelet Transform (CWT) asks: "How much does the signal f(t) look like the wavelet at scale s and position τ?"

Definition

W(s, τ) = ∫_-∞^∞ f(t) · ψ*_s,τ(t) dt = <f, ψ_s,τ>

Expanding the wavelet:

W(s, τ) = (1/√s) ∫_-∞^∞ f(t) · ψ*((t - τ)/s) dt

The CWT maps a 1D function f(t) to a 2D function W(s, τ). The input is a signal over time. The output is a surface over (scale, translation) — equivalent to (frequency, time). This is the wavelet analog of the spectrogram, but with adaptive resolution.

Interpretation

• W(s, τ) large: The signal at time τ matches the wavelet at scale s well (there's energy at frequency f₀/s near time τ)

• W(s, τ) ≈ 0: No energy at that scale/time combination

• W(s, τ) < 0: Anti-correlation (signal is "opposite" to wavelet at that point)

CWT as Correlation

The CWT at each scale is actually a cross-correlation between f(t) and the scaled wavelet ψ_s(t). We're sliding the wavelet across the signal and measuring similarity at each position. This is the same template-matching idea from Lecture 9, but with a frequency-adaptive template!

Worked Example

Signal: f(t) = sin(2π · 5t) for 0 < t < 0.5, then sin(2π · 20t) for 0.5 < t < 1.

CWT with Mexican hat wavelet:

• At scale s = f₀/5 (tuned to 5 Hz): W is large for τ < 0.5, near zero for τ > 0.5

• At scale s = f₀/20 (tuned to 20 Hz): W is near zero for τ < 0.5, large for τ > 0.5

• The CWT correctly localizes which frequency is present at which time!

In Code

python
import numpy as np

def mexican_hat(t):
    """Mexican hat (Ricker) wavelet."""
    return (1 - t**2) * np.exp(-t**2 / 2)

def cwt(signal, scales, dt=1.0):
    """Continuous Wavelet Transform.
    Returns (len(scales), len(signal)) coefficient matrix."""
    N = len(signal)
    t = np.arange(N) * dt
    coeffs = np.zeros((len(scales), N))

    for i, s in enumerate(scales):
        # Create wavelet at this scale
        t_wavelet = np.arange(-4*s, 4*s, dt) / s
        wavelet = mexican_hat(t_wavelet) / np.sqrt(s)
        # Correlate with signal
        coeffs[i] = np.correlate(signal, wavelet, mode='same')

    return coeffs

CWT of a Two-Frequency Signal

Top: signal that switches frequency mid-way. Bottom: CWT magnitude (scale vs time). Notice how both the frequency change AND its timing are clearly resolved.

The CWT maps a 1D signal f(t) to a 2D function W(s, τ). What are the two axes?

Amplitude and phase Time and amplitude Scale (inversely related to frequency) and translation (time position)

Chapter 3: Time-Frequency Tiling

The key advantage of wavelets over the STFT becomes crystal clear when we compare their time-frequency tilings.

STFT: Uniform Grid

The STFT divides the time-frequency plane into identical rectangles. Every tile has the same width Δt and height Δf, regardless of frequency. This means:

• At 100 Hz, we have the same time resolution as at 10000 Hz

• At 10000 Hz, we have the same frequency resolution as at 100 Hz

This is wasteful. At 100 Hz, one period is 10 ms — we don't need sub-millisecond timing. But at 10000 Hz, one period is 0.1 ms — we DO need fine timing to resolve individual cycles.

Wavelet: Dyadic Grid

Wavelets use tiles whose width and height depend on scale:

• Low frequency (large scale s): Wide in time (Δt = s · Δt₀), narrow in freq (Δf = Δf₀/s)

• High frequency (small scale s): Narrow in time (Δt = s · Δt₀), wide in freq (Δf = Δf₀/s)

The area of each tile is constant: Δt · Δf = Δt₀ · Δf₀ (uncertainty principle satisfied).

The wavelet tiling matches how nature works: High-frequency events (clicks, onsets, edges) are brief — wavelets resolve their timing. Low-frequency events (bass notes, trends) are sustained — wavelets resolve their frequency. The STFT compromises everywhere; wavelets are optimal everywhere.

Constant-Q Property

The ratio of center frequency to bandwidth is constant for all wavelet scales:

Q = f_center / Δf = f₀/s ÷ (Δf₀/s) = f₀/Δf₀ = constant

This is called a constant-Q transform. It means the frequency resolution is always the same fraction of the center frequency — exactly matching musical pitch (octaves, semitones). The mel scale in audio ML is motivated by this same idea.

Comparison Table

Property	STFT	CWT
Tile shape	Fixed rectangles	Adaptive (scale-dependent)
Δf at low freq	Same as at high freq	Narrow (good resolution)
Δt at high freq	Same as at low freq	Narrow (good resolution)
Q factor	Varies (Q = f/Δf increases with f)	Constant
Basis functions	Windowed sinusoids	Scaled wavelets
Musical pitch	Linear frequency axis	Logarithmic (matches octaves)

STFT vs. Wavelet: Resolving a Multi-Scale Signal

A signal with a low-freq tone + a brief high-freq click. See how each representation handles the two components.

The "constant-Q" property of wavelets means:

The ratio f_center/Δf is the same at all scales — frequency resolution is always a fixed percentage of center frequency All scales have the same absolute frequency resolution The wavelet coefficients are constant

Chapter 4: DWT Filter Bank

The CWT is beautiful mathematically, but expensive computationally: it samples scale and translation continuously. The Discrete Wavelet Transform (DWT) samples only at dyadic scales (s = 2^j) and achieves the same information content with far less computation.

The Key Insight: Filter Banks

Stéphane Mallat (1989) showed that the DWT can be computed as a cascade of simple filter + downsample operations. No need to explicitly construct wavelets at each scale — just apply two filters repeatedly.

The DWT algorithm: At each level, split the signal into LOW frequencies (approximation) and HIGH frequencies (detail) using a low-pass filter H₀ and a high-pass filter H₁. Then downsample by 2. Repeat on the low-frequency part. This is called a filter bank or subband coding.

One Level of the DWT

Input x[n]

N samples at full resolution

↓ split into two branches

Low-pass H₀

Keep frequencies 0 to f_s/4 → downsample ↓2 → approximation a[n] (N/2 samples)

High-pass H₁

Keep frequencies f_s/4 to f_s/2 → downsample ↓2 → detail d[n] (N/2 samples)

After one level: N/2 approximation coefficients + N/2 detail coefficients = N total. No information lost!

Multi-Level Decomposition

Apply the same split to the approximation coefficients (the low-frequency part) again and again:

Level 1: x → a₁ (N/2) + d₁ (N/2)

Level 2: a₁ → a₂ (N/4) + d₂ (N/4)

Level 3: a₂ → a₃ (N/8) + d₃ (N/8)

After J levels: a_J (N/2^J) + d_J (N/2^J) + ... + d₂ (N/4) + d₁ (N/2) = N coefficients total.

The Filters

For the Haar wavelet: H₀ = [1/√2, 1/√2] (average), H₁ = [1/√2, -1/√2] (difference).

For Daubechies wavelets: longer filters (4, 6, 8... taps) with specific coefficients that guarantee orthogonality and vanishing moments.

Worked Example: One Level of Haar DWT

Signal: x = [4, 6, 10, 12, 8, 6, 5, 5] (8 samples).

Approximation (average pairs):

a = [(4+6)/√2, (10+12)/√2, (8+6)/√2, (5+5)/√2] = [7.07, 15.56, 9.90, 7.07]

Detail (difference of pairs):

d = [(4-6)/√2, (10-12)/√2, (8-6)/√2, (5-5)/√2] = [-1.41, -1.41, 1.41, 0]

Notice: a captures the trend (values going up then down). d captures the local variation (small differences between adjacent samples). The last detail coefficient is 0 because samples 7 and 8 are equal — no local change.

Total: 4 approximation + 4 detail = 8 coefficients = same as input. Perfect reconstruction: x[2k] = (a[k] + d[k])/√2, x[2k+1] = (a[k] - d[k])/√2.

In Code

python
import numpy as np

def dwt_level(x, h0, h1):
    """One level of DWT decomposition."""
    # Convolve and downsample by 2
    approx = np.convolve(x, h0, mode='same')[::2]
    detail = np.convolve(x, h1, mode='same')[::2]
    return approx, detail

def dwt_multilevel(x, h0, h1, levels=3):
    """Multi-level DWT. Returns [aJ, dJ, ..., d2, d1]."""
    details = []
    current = x
    for j in range(levels):
        current, d = dwt_level(current, h0, h1)
        details.append(d)
    return [current] + details[::-1]

# Haar filters
h0 = np.array([1, 1]) / np.sqrt(2)   # low-pass (average)
h1 = np.array([1, -1]) / np.sqrt(2)  # high-pass (difference)

DWT Filter Bank Decomposition

Watch a signal decompose level by level. Each level splits into approximation (low freq) and detail (high freq) with half the samples.

Levels 3

A signal has 1024 samples. After 3 levels of DWT, what are the coefficient counts?

256 + 256 + 256 + 256 = 1024 a₃(128) + d₃(128) + d₂(256) + d₁(512) = 1024 512 + 256 + 128 + 128 = 1024

Chapter 5: Haar & Daubechies Wavelets

Different wavelet families are optimized for different signal characteristics. The two most important: Haar (simplest) and Daubechies (most versatile).

Haar Wavelet (db1)

The Haar wavelet (1909, Alfred Haar) is the simplest possible wavelet:

ψ(t) = +1 for 0 ≤ t < 1/2, -1 for 1/2 ≤ t < 1, 0 otherwise

It's a step function: +1 on the first half, -1 on the second half. The filters are:

• H₀ = [1, 1]/√2 (local average of two samples)

• H₁ = [1, -1]/√2 (local difference of two samples)

The Haar DWT is just averaging and differencing! At each level, take pairs of samples. The average goes to the approximation (smooth trend). The difference goes to the detail (local change). This is the most intuitive wavelet — average removes noise, difference detects edges.

Haar Limitations

The Haar wavelet is discontinuous (step function). This means:

• Poor frequency localization (step has infinite bandwidth)

• Block artifacts in reconstruction (visible staircasing)

• Only 1 vanishing moment (can only represent constants exactly)

Daubechies Wavelets (dbN)

Ingrid Daubechies (1988) constructed a family of orthogonal wavelets with compact support (finite length) and N vanishing moments:

• db1 = Haar (2 filter taps, 1 vanishing moment)

• db2: 4 filter taps, 2 vanishing moments

• db4: 8 filter taps, 4 vanishing moments

• db8: 16 filter taps, 8 vanishing moments

Vanishing Moments

A wavelet with N vanishing moments satisfies:

∫ t^k ψ(t) dt = 0 for k = 0, 1, ..., N-1

This means it's "blind" to polynomials of degree < N. A wavelet with 4 vanishing moments won't respond to cubic trends — only to features sharper than a cubic. More vanishing moments = better at ignoring smooth background.

Wavelet	Filter Length	Vanishing Moments	Best For
Haar (db1)	2	1	Edge detection, binary signals
db2	4	2	Piecewise-linear signals
db4	8	4	General audio, EEG
db8	16	8	Smooth signals, compression

Wavelet Family Comparison

Compare Haar (blocky) vs. higher-order Daubechies wavelets (smooth). More taps = smoother but wider support.

A signal contains a quadratic trend plus sharp spikes. Which wavelet is best for detecting ONLY the spikes (ignoring the trend)?

Haar (1 vanishing moment — responds to all non-constant features including the trend) db2 (2 vanishing moments — ignores linear but not quadratic trends) db4 (4 vanishing moments — ignores polynomial up to cubic, revealing only the spikes)

Chapter 6: Multi-Resolution Analysis

The DWT's multi-level decomposition gives us something extraordinary: the ability to see a signal at different resolutions simultaneously. Each level peels off one layer of detail, revealing progressively coarser structure.

The MRA Framework

At level j, the signal is represented as:

f(t) = a_J(t) + d_J(t) + d_J-1(t) + ... + d₁(t)

Where:

• a_J: Coarsest approximation (overall shape, DC + very low frequencies)

• d_j: Detail at level j (captures frequencies in band [f_s/2^j+1, f_s/2^j])

This is perfect reconstruction: sum all components and you get back the original signal exactly.

Multi-resolution = zoom levels. Imagine Google Maps. Level 0 = street view (every detail). Level 1 = neighborhood (broad streets, no house numbers). Level 2 = city (only highways). Level 3 = country (only borders). Each "detail" level is what you LOSE going from one zoom to the next. Sum them all = original full-resolution view.

Applications of MRA

Denoising: Noise is typically broadband (present at all scales). Signal features are concentrated at specific scales. Strategy: decompose, threshold small detail coefficients (likely noise), reconstruct.

python
def wavelet_denoise(signal, threshold, levels=4):
    """Denoise via wavelet thresholding."""
    coeffs = dwt_multilevel(signal, h0, h1, levels)
    # Threshold detail coefficients (soft threshold)
    for i in range(1, len(coeffs)):
        coeffs[i] = np.sign(coeffs[i]) * np.maximum(np.abs(coeffs[i]) - threshold, 0)
    # Reconstruct
    return idwt_multilevel(coeffs, g0, g1)

Compression (JPEG2000): Most wavelet coefficients are near zero. Keep only the large ones (say, top 10%). This is why JPEG2000 uses the DWT instead of the DCT (used in JPEG). Wavelets give fewer artifacts at high compression because they handle edges better.

Edge detection: Edges (sharp transitions) produce large detail coefficients at fine scales. Find peaks in d₁ = edge locations.

Wavelet Thresholding: Soft vs. Hard

Hard thresholding: Zero out coefficients below threshold λ, keep the rest unchanged.

d̂[n] = d[n] if |d[n]| ≥ λ, 0 otherwise

Soft thresholding: Shrink all coefficients toward zero by λ. Smoother result.

d̂[n] = sign(d[n]) · max(|d[n]| - λ, 0)

Donoho & Johnstone (1994) proved that the universal threshold λ = σ √(2 log N) is nearly minimax-optimal for denoising, where σ is the noise standard deviation and N is the signal length. This remarkable result means wavelet denoising is provably near-optimal with almost no tuning.

Why JPEG2000 Uses Wavelets

JPEG (1992) uses the DCT (block-by-block cosine transform). At high compression, you see blocking artifacts at the 8×8 block boundaries because the DCT processes each block independently.

JPEG2000 (2000) uses the DWT on the entire image. No blocks → no blocking artifacts. At the same bitrate, JPEG2000 gives 2–3 dB better PSNR than JPEG. The advantage is even larger at very low bitrates where JPEG falls apart.

The specific wavelet used is the CDF 9/7 biorthogonal wavelet — chosen because it has good compression performance AND allows reversible (lossless) encoding with an integer variant.

Multi-Resolution Decomposition & Reconstruction

Decompose a signal into levels. Toggle each level on/off to see its contribution. Remove a level = lose those frequencies. This is how wavelet denoising and compression work.

Noise 0.30

In wavelet denoising, we threshold the detail coefficients. Why does this remove noise while preserving signal?

Noise only exists at high frequencies Noise produces many small coefficients at all levels, while signal features produce a few large coefficients — thresholding keeps the large (signal) and removes the small (noise) The wavelet transform amplifies noise to make it easier to remove

Chapter 7: Mastery

Let's consolidate the wavelet transform and see its place in the signal processing toolkit.

The Complete Wavelet Toolkit

Transform	Input	Output	Use
CWT	1D signal	2D (scale × time)	Analysis, visualization
DWT	1D signal (N samples)	N coefficients (multi-level)	Compression, denoising
IDWT	Wavelet coefficients	Reconstructed signal	Synthesis after modification

Key Formulas

CWT: W(s,τ) = (1/√s) ∫ f(t) ψ*((t-τ)/s) dt

Scaling: ψ_s,τ(t) = (1/√s) ψ((t-τ)/s)

DWT: a[n] = (x * h₀)[↓2], d[n] = (x * h₁)[↓2]

Reconstruction: f = a_J + d_J + d_J-1 + ... + d₁

Applications Summary

Application	How Wavelets Help	Standard Used
JPEG2000	DWT replaces DCT; better edges at high compression	CDF 9/7 wavelet
Denoising	Threshold small coefficients (noise) → reconstruct	db4-db8 + soft threshold
Edge detection	Large detail coefficients at fine scales = edges	Haar or db2
EEG/ECG	Multi-scale analysis of brain/heart signals	db4, Morlet
Seismology	CWT for multi-scale event detection	Morlet wavelet
Speech coding	Subband coding for efficient compression	Filter bank (QMF)

DFT vs. STFT vs. Wavelets

	DFT	STFT	Wavelet
Time info	None	Fixed resolution	Adaptive
Freq info	Perfect	Fixed resolution	Adaptive
Basis	Sinusoids	Windowed sinusoids	Scaled wavelets
Best for	Stationary signals	Slowly varying	Multi-scale / transients
Complexity	O(N log N)	O(N log N)	O(N)

Connections

• Lecture 8 (STFT): Wavelets overcome the fixed-window limitation

• Lecture 9 (Classification): Wavelet coefficients as features for signal classification

• Deep Learning: Modern architectures (WaveNet, WavLM) use dilated convolutions — effectively a learned wavelet transform

• Vision: Image pyramids in CNNs mirror multi-resolution wavelet analysis

The Bigger Picture: Transform Zoo

The DFT, STFT, and wavelet transform are all members of a larger family of time-frequency representations. Each makes different tradeoffs:

• Wigner-Ville distribution: Best possible time-frequency resolution (no uncertainty smearing!) but suffers from cross-term interference (phantom energy between two real components). Quadratic, not linear.

• Gabor transform: STFT with Gaussian window. Achieves the uncertainty minimum. The theoretical gold standard, but impractical because discrete Gaussians don't form an orthogonal basis.

• Synchrosqueezed wavelet transform: Post-processes the CWT to sharpen ridges. Gives better frequency estimates than raw CWT while maintaining invertibility.

• Scattering transform: Cascade of wavelet transforms + modulus. Provably invariant to translations and stable to deformations. A mathematically principled alternative to deep features.

When to Use What

• Use DFT when: signal is stationary, you need exact frequency values, speed matters

• Use STFT when: signal is slowly varying, you need time-frequency visualization, you want simple reconstruction

• Use DWT when: signal has multi-scale structure, you need compression or denoising, you want O(N) speed

• Use CWT when: you need detailed time-scale analysis, scale matters more than frequency, visualization is the goal

Ingrid Daubechies (1988): "I had this very strong feeling that wavelets could be really useful... I wanted to construct wavelets that would be as short as possible while still being smooth." — Her dbN family revolutionized signal processing and won her the Wolf Prize in Mathematics (2023).

Why does the DWT have O(N) complexity while the FFT has O(N log N)?

Each level processes half as many samples as the previous (N + N/2 + N/4 + ... = 2N = O(N)), while the FFT processes all N samples at each of log N stages The DWT uses simpler arithmetic The DWT only computes half the coefficients