Short-time Fourier transform
A spectrogram (using the Short-Time Fourier Transform, STFT) is a visual representation of how the frequencies of a signal change over time. It works by breaking the signal into small time segments (windows), applying the Fourier Transform to each, and displaying the resulting frequencies as a heatmap, where time is on the x-axis, frequency on the y-axis, and color intensity represents amplitude. A longer window gives you finer frequency resolution (
Our brain-ear system performs a process similar to a spectrogram. The cochlea in the inner ear acts like a frequency analyzer, breaking incoming sounds into different frequency components, much like the Short-Time Fourier Transform (STFT). These frequency signals are then sent to the brain, which processes and interprets them as speech, music, or other sounds.
Related:
-
Gabor transform: something similar but with a kind of Gaussian window... It optimizes for the uncertainty principle.
-
wavelet transform: we choose a window size depending on the frequency:
- High frequencies → very short windows (good time resolution)
- Low frequencies → long windows (good frequency resolution)
Related: how the sound is produced by software.