Choosing the right wavelet, mastering the STFT, and applying both to real problems — denoising, compression, edge detection.
You have a recording of a violin playing a note. Unfortunately, the microphone was cheap and the recording is swamped in broadband noise — a constant hiss that buries the music. How do you clean it up?
Your first instinct: use a frequency-domain filter. Compute the DFT, zero out the noisy bins, inverse DFT. But the noise is everywhere in frequency — it overlaps the violin's harmonics. A simple frequency mask would kill the music along with the noise.
Here's the key insight: the violin's harmonics are structured — they persist over time and concentrate in specific frequency bands. The noise is unstructured — it spreads evenly across all times and frequencies. In the wavelet domain, structured signal produces large coefficients at specific scales, while noise produces small coefficients everywhere.
But to make this work, you need to choose the right wavelet. Not all wavelets are created equal. A Haar wavelet (blocky, discontinuous) would be terrible for a smooth violin signal. A Daubechies-20 (smooth, wide support) would be great for the violin but terrible for detecting a sharp click in the recording.
This lecture is about making those choices. We'll learn what makes a wavelet "good" for a given task, survey the major wavelet families, compare wavelets to Fourier methods, and then apply everything to real problems: denoising, compression, edge detection, and audio analysis.
A clean signal buried in noise. Compare naive frequency filtering (kills signal too) with wavelet thresholding (preserves signal structure).
In Lecture 10, we introduced the mother wavelet ψ(t) and showed how scaling and shifting it creates a basis for decomposing signals. But we didn't ask: what makes one mother wavelet better than another?
There are four properties that determine how useful a wavelet is for a given application. Let's examine each one.
A wavelet has compact time support if ψ(t) = 0 outside some finite interval [a, b]. The support length is b - a. Why does this matter?
Short support means each wavelet coefficient depends on only a few samples of the signal. This gives excellent time localization: if a spike appears at sample n = 100, only the coefficients whose wavelet overlaps n = 100 will be affected. Long-support wavelets spread the spike's influence across many coefficients.
Dually, we want Ψ(ω) — the Fourier transform of ψ(t) — to be concentrated in a narrow frequency band. This means each scale of the wavelet decomposition isolates a specific frequency range cleanly, with minimal leakage into other bands.
Here's the catch: by the uncertainty principle, you can't have both arbitrarily short time support AND arbitrarily narrow frequency support. Δt · Δω ≥ 1/2. So every wavelet is a compromise.
A wavelet has N vanishing moments if:
What does this buy you? If the signal locally looks like a polynomial of degree < N, the wavelet coefficients at that location will be zero (or very small). Since smooth signals can be locally approximated by polynomials, more vanishing moments means smoother signals produce sparser wavelet representations.
Worked example: Haar has N = 1 vanishing moment (∫ ψ(t) dt = 0). It zeroes out constant regions. Daubechies-4 (db4) has N = 4 vanishing moments — it zeroes out cubic trends. For a smooth sine wave, db4 gives much sparser coefficients than Haar.
An orthogonal wavelet basis satisfies 〈ψj,k, ψm,n〉 = δjmδkn. This means:
• No redundancy — each coefficient carries unique information
• Parseval's theorem holds — energy in signal domain equals energy in wavelet domain
• Perfect reconstruction is trivial — just sum the weighted basis functions
Not all useful wavelets are orthogonal. The Morlet wavelet (used in geophysics) is redundant but very smooth. Orthogonality is most important when you need to preserve energy or do compression.
| Property | Benefits | Tension With |
|---|---|---|
| Short time support | Good time localization, fast computation | Smoothness, freq. support |
| Narrow freq. support | Clean frequency separation | Time support (uncertainty) |
| Many vanishing moments | Sparser for smooth signals | Longer time support needed |
| Orthogonality | Energy preservation, no redundancy | Smoothness constraints |
Drag the slider to increase vanishing moments. Notice how the wavelet gets longer (more oscillations) but smoother.
Knowing what makes a good wavelet is one thing. Having actual wavelets you can use is another. Let's survey the major families. Each is a collection of wavelets indexed by a parameter (usually order N) that controls the tradeoff between time support and vanishing moments.
The simplest possible wavelet:
Support: 1. Vanishing moments: 1. Smoothness: None (discontinuous).
Haar is fast and simple. It's the only orthogonal wavelet that is symmetric AND compactly supported. Its discontinuity means it's excellent for detecting sharp edges but terrible for smooth signals — it produces large coefficients at every point where the signal isn't piecewise constant.
Ingrid Daubechies (1988) constructed a family of wavelets that are orthogonal, compactly supported, and have maximum vanishing moments for a given support length.
Naming: dbN has N vanishing moments and support length 2N - 1. So db1 = Haar (1 moment, length 1), db2 has 2 moments and length 3, db4 has 4 moments and length 7.
Filter coefficients for db2:
These are the lowpass filter coefficients. The highpass coefficients come from alternating signs: g[n] = (-1)n h[L-1-n]. This guarantees orthogonality between the scaling function and the wavelet.
Daubechies' construction was modified to produce wavelets that are as nearly symmetric as possible while retaining orthogonality and compact support. Symlets have the same number of vanishing moments as Daubechies wavelets of the same order, but with less phase distortion.
In practice: sym4 through sym8 are preferred over dbN when linear phase (no phase distortion) matters — especially for image processing.
Named after Ronald Coifman, who asked Daubechies to construct wavelets where both the wavelet AND the scaling function have vanishing moments. coifN has 2N vanishing moments for the wavelet and 2N-1 for the scaling function, with support length 6N-1.
When do you need scaling function vanishing moments? When you're using the wavelet transform for numerical approximation — representing functions by their scaling coefficients. Coiflets give better approximation of smooth functions at coarse scales.
| Family | Vanishing Moments | Support Length | Symmetry | Best For |
|---|---|---|---|---|
| Haar (db1) | 1 | 1 | Symmetric | Edge detection, binary signals |
| Daubechies (dbN) | N | 2N-1 | Asymmetric | General purpose, compression |
| Symlets (symN) | N | 2N-1 | Near-symmetric | Image processing |
| Coiflets (coifN) | 2N | 6N-1 | Near-symmetric | Numerical analysis |
View the shape of different wavelet families. Higher order = more vanishing moments = smoother but wider.
In MATLAB, the standard function for multilevel wavelet decomposition is wavedec:
matlab % Decompose signal x to level 4 using db4 [C, L] = wavedec(x, 4, 'db4'); % C = concatenated coefficients [a4 | d4 | d3 | d2 | d1] % L = bookkeeping vector of lengths % Extract detail coefficients at level 3 d3 = detcoef(C, L, 3); % Reconstruct from modified coefficients x_recon = waverec(C, L, 'db4');
The output C is a single vector that packs all coefficients end-to-end: approximation at the coarsest level, then details from coarse to fine. L tells you how many coefficients belong to each level.
In Python with PyWavelets:
python import pywt import numpy as np # Decompose to level 4 using db4 coeffs = pywt.wavedec(x, 'db4', level=4) # coeffs = [a4, d4, d3, d2, d1] (list of arrays) # Reconstruct x_recon = pywt.waverec(coeffs, 'db4')
Both Fourier and wavelet transforms decompose signals into basis functions. But they have fundamentally different strengths. Understanding when to use each is one of the most important skills in signal processing.
The Convolution Theorem: The DFT turns convolution into multiplication. If you need to filter a signal, the DFT route is often the fastest: DFT both signals, multiply, inverse DFT. Wavelets have no simple analogue — convolution in the wavelet domain is complicated.
Uniform spectral resolution: Every frequency bin has the same bandwidth: Δf = fs/N. If your signal has energy evenly distributed across frequency (like white noise analysis or spectral estimation), uniform resolution is exactly what you want.
Phase information: The DFT naturally gives you magnitude AND phase. Phase encodes timing information. For applications like communications (where carrier phase matters), Fourier is the natural choice.
Adaptive resolution: This is Pilanci's key point: "100 Hz resolution at 400 Hz and at 4000 Hz are not the same." At 400 Hz, 100 Hz resolution means you can distinguish 300 Hz from 500 Hz — a 25% difference. At 4000 Hz, 100 Hz resolution means you can distinguish 3900 Hz from 4100 Hz — a 2.5% difference.
Sparsity for piecewise-smooth signals: Natural images and audio are piecewise-smooth: smooth regions punctuated by edges or transients. Wavelets represent these signals with very few large coefficients (at the edges) and many near-zero coefficients (in smooth regions). The DFT spreads each edge across ALL frequency bins — it's the opposite of sparse.
Time localization: A wavelet coefficient at scale j, position k tells you "there's energy at frequency ~2j near time k." A Fourier coefficient X[k] tells you "there's energy at frequency k across ALL time." For nonstationary signals, wavelet wins.
| Feature | Fourier (DFT) | Wavelet (DWT) |
|---|---|---|
| Basis functions | Sinusoids (infinite extent) | Localized oscillations (finite extent) |
| Frequency resolution | Uniform (Δf = fs/N) | Logarithmic (Δf/f = const) |
| Time localization | None | Yes (adaptive) |
| Convolution theorem | Yes — fast filtering | No simple analogue |
| Best for stationary signals | Yes | Overkill |
| Best for transients/edges | Poor | Excellent |
| Compression of natural signals | Moderate | Excellent (JPEG 2000 uses wavelets) |
| Computational cost | O(N log N) via FFT | O(N) via filter bank |
A piecewise-smooth signal. Compare the coefficient magnitudes: Fourier spreads energy everywhere; wavelet concentrates it at the discontinuities.
Use Fourier when: The signal is stationary, you need the convolution theorem, you care about exact frequency values, or you're doing spectral estimation.
Use wavelets when: The signal is nonstationary, you need time-frequency analysis, you're doing compression or denoising of natural signals, or you need to detect transients/edges.
Use STFT when: You need a middle ground — some time localization but with the interpretability of frequency bins (Hz on the axis rather than "scale").
Let's revisit the STFT with fresh eyes. In Lecture 8, we defined it and explored the time-frequency tradeoff. Now we'll focus on the practical details that matter for applications: the exact formula, window choice, and what the output actually looks like as data.
Given a signal x[n] and a window function w[m] of length M, the STFT at time frame n and frequency bin k is:
Note the convention: x[n + m] means we start the window at sample n and look forward M samples. The window w[m] tapers the segment. The complex exponential e-j(2π/N)km extracts frequency bin k.
The STFT produces X[n, k] — a complex number for each (time frame n, frequency bin k) pair. From it we derive:
• Magnitude spectrogram: |X[n, k]| — most common visualization
• Power spectrogram: |X[n, k]|2 — used in MFCC computation (Lecture 13)
• Phase spectrogram: ∠X[n, k] — important for synthesis/reconstruction
The window w[m] controls the tradeoff between main lobe width (frequency resolution) and side lobe level (spectral leakage). Common choices:
| Window | Main Lobe Width | Side Lobe Level | Use Case |
|---|---|---|---|
| Rectangular | Narrowest (4π/M) | Worst (-13 dB) | When you need max frequency resolution |
| Hann | Moderate (8π/M) | Good (-31 dB) | General audio analysis |
| Hamming | Moderate (8π/M) | Better (-43 dB) | Speech processing |
| Blackman | Wide (12π/M) | Best (-58 dB) | When leakage must be minimal |
The Hann window is the most popular default. It's defined as:
It tapers smoothly to zero at the edges, which prevents the sharp cutoff artifacts that a rectangular window produces.
A signal with a low tone followed by a high tone. Adjust window length to see the time-frequency tradeoff in action.
python import numpy as np def stft(x, win_len, hop, n_fft=None): """Compute STFT of x with given window length and hop size.""" if n_fft is None: n_fft = win_len w = np.hanning(win_len) # Hann window n_frames = (len(x) - win_len) // hop + 1 X = np.zeros((n_frames, n_fft), dtype=complex) for n in range(n_frames): start = n * hop frame = x[start:start+win_len] * w # window the segment X[n] = np.fft.fft(frame, n_fft) # DFT of windowed frame return X # Power spectrogram (what you usually plot) S = np.abs(stft(x, 512, 128))**2
This is the payoff. Wavelet denoising is one of the most elegant applications of everything we've learned: choose a wavelet, decompose, threshold, reconstruct. Let's build it step by step, then play with it interactively.
The insight is statistical. If the signal is piecewise-smooth and the noise is white Gaussian:
• Signal produces a few LARGE wavelet coefficients (at edges and features)
• Noise produces MANY SMALL wavelet coefficients (spread evenly)
A threshold separates them. Coefficients above the threshold are mostly signal. Coefficients below are mostly noise.
David Donoho and Iain Johnstone (1994) proved that the universal threshold is near-optimal:
Where σ is the noise standard deviation and N is the signal length. This threshold guarantees that, with high probability, all pure-noise coefficients fall below it.
Hard thresholding: Keep coefficients above λ unchanged, zero out the rest.
Soft thresholding: Shrink coefficients above λ toward zero by λ, zero out the rest.
Hard thresholding preserves amplitudes but can create artifacts at the threshold boundary. Soft thresholding is smoother but introduces bias (it shrinks everything). In practice, soft thresholding usually gives better visual results.
Signal: x = [4, 4, 4, 4, 0, 0, 0, 0] (step function). Noise: σ = 0.5. Observed: y = [4.3, 3.8, 4.6, 4.1, 0.2, -0.4, 0.3, -0.1].
Haar DWT of y at level 1 gives approximation a1 and detail d1. The detail coefficient at the step edge is large (~4), while detail coefficients in smooth regions are small (~0.3–0.5). Threshold λ = 0.5 · √(2 ln 8) ≈ 1.02. Only the edge coefficient survives. Reconstruction: a clean step function.
A signal buried in noise. Choose wavelet, adjust noise level and threshold, toggle hard/soft thresholding. Watch the denoised output change in real time.
python import numpy as np import pywt def wavelet_denoise(y, wavelet='db4', level=4, mode='soft'): """Denoise signal y using wavelet thresholding.""" coeffs = pywt.wavedec(y, wavelet, level=level) # Estimate noise from finest detail sigma = np.median(np.abs(coeffs[-1])) / 0.6745 threshold = sigma * np.sqrt(2 * np.log(len(y))) # Threshold detail coefficients (skip approx) for i in range(1, len(coeffs)): coeffs[i] = pywt.threshold(coeffs[i], threshold, mode=mode) return pywt.waverec(coeffs, wavelet) # Usage x_clean = wavelet_denoise(y_noisy)
Everything we've done for 1D signals extends naturally to 2D images. The key idea: apply the wavelet transform separately along rows and columns. This creates a 2D DWT that decomposes an image into four sub-bands at each level.
For an image I[x, y], one level of the 2D DWT produces four sub-images:
| Sub-band | Row Filter | Col Filter | Content |
|---|---|---|---|
| LL (Approximation) | Lowpass | Lowpass | Blurred, downsampled version |
| LH (Horizontal detail) | Lowpass | Highpass | Horizontal edges |
| HL (Vertical detail) | Highpass | Lowpass | Vertical edges |
| HH (Diagonal detail) | Highpass | Highpass | Diagonal edges |
The LL sub-band is then recursively decomposed for multi-level analysis. This is exactly the 2D analogue of the 1D filter bank from Lecture 10.
The 1D denoising pipeline extends directly: decompose image into 2D wavelet sub-bands, threshold the detail sub-bands (LH, HL, HH at each level), reconstruct. The universal threshold still applies, estimated from the finest HH sub-band.
The LH, HL, and HH sub-bands at the finest scale contain the edges. A wavelet-based edge detector is simply:
1. Compute one level of the 2D DWT
2. Take the magnitude of the detail sub-bands: E[x,y] = √(LH2 + HL2 + HH2)
3. Threshold E to get binary edge map
This is more robust than simple gradient-based edge detection because the wavelet's vanishing moments suppress noise while preserving true edges. Higher-order wavelets (more vanishing moments) give cleaner edge maps but with some loss of localization.
A synthetic image decomposed into LL, LH, HL, HH sub-bands. Toggle to see what each sub-band captures.
Wavelet compression is conceptually simple:
1. Compute the 2D DWT (5-7 levels)
2. Sort all coefficients by magnitude
3. Keep only the top k% largest coefficients, zero out the rest
4. Inverse DWT to reconstruct
For a typical natural image, keeping just 5-10% of wavelet coefficients produces a visually acceptable reconstruction. The DFT would need 20-40% to achieve similar quality. This is why JPEG 2000 (wavelets) beats JPEG (DCT) at low bit rates.
python import pywt import numpy as np def wavelet_compress(img, wavelet='db4', keep_pct=10): """Compress image by keeping top keep_pct% of wavelet coefficients.""" coeffs = pywt.wavedec2(img, wavelet, level=5) # Flatten all coefficients arr, slices = pywt.coeffs_to_array(coeffs) thresh = np.percentile(np.abs(arr), 100 - keep_pct) arr[np.abs(arr) < thresh] = 0 # Reconstruct coeffs_t = pywt.array_to_coeffs(arr, slices, output_format='wavedec2') return pywt.waverec2(coeffs_t, wavelet)
Let's consolidate. This lecture connected the theoretical wavelet machinery from Lecture 10 to practical tools and applications.
| If Your Signal Is... | Use... | Why |
|---|---|---|
| Stationary, need spectral analysis | DFT/FFT | Uniform resolution, convolution theorem |
| Nonstationary, need time + freq | STFT | Familiar Hz axis, tunable resolution |
| Multi-scale, transients + smooth | DWT | Adaptive resolution, sparse representation |
| Audio, need onset detection | STFT (spectrogram) | Standard in audio, well-understood |
| Image compression | 2D DWT | JPEG 2000, excellent low-bitrate quality |
| Denoising any 1D/2D signal | DWT + thresholding | Near-optimal for piecewise-smooth signals |
| Task | Recommended Wavelet | Why |
|---|---|---|
| Sharp edge detection | Haar (db1) | Shortest support, catches discontinuities |
| General 1D denoising | db4 – db8 | Good smoothness/support balance |
| Image processing | sym4 – sym8 | Near-symmetry avoids phase artifacts |
| Smooth signal analysis | db8+ or coif4+ | Many vanishing moments for sparsity |
| JPEG 2000 compression | CDF 9/7 (lossy), Le Gall 5/3 (lossless) | Biorthogonal, standard compliant |
Looking back: Lecture 10 (Wavelets) built the CWT, DWT, and filter bank. Lecture 8 (STFT) established time-frequency analysis. This lecture brought them together for applications.
Looking forward: Lecture 12 (Linear Systems) formalizes the filter interpretation — what it means for a wavelet filter bank to be an LTI system. Lecture 13 (Cepstrum & MFCC) takes the STFT output and builds speech features from it.