A spectrogram is a two-dimensional representation of sound. The horizontal axis shows time advancing from left to right. The vertical axis shows frequency, measured in hertz (Hz) or kilohertz (kHz), increasing upward. A third dimension — the brightness or darkness of each point — represents amplitude, meaning how loud a particular frequency is at a given moment.
For bird identification, spectrograms are useful because they make the acoustic structure of a call visible. Two calls that sound similar to untrained ears can look quite different on a spectrogram, and vice versa. Once you know what to look for, the shape of a mark on a spectrogram tells you a great deal about how a bird produces its sound and what species it is likely to be.
Spectrogram of melodious warbler (Hippolais polyglotta) song — a complex, rapidly modulated vocalization. The dense vertical lines indicate rapid note changes. Source: Wikimedia Commons / CC BY-SA.
The Two Axes
Time (horizontal)
The horizontal axis is straightforward: each millimeter of width corresponds to a fixed duration of time, typically set when the spectrogram is generated. For bird calls, the relevant timescales are usually milliseconds to a few seconds. A call lasting 0.3 seconds will appear as a short mark; a continuous song phrase lasting 4 seconds will span most of the horizontal field.
Frequency (vertical)
The vertical axis shows pitch. Most bird vocalizations fall between 1 kHz and 10 kHz, though some species produce sounds outside this range. The lower portion of the spectrogram (closer to 0 Hz) represents bass frequencies; the upper portion represents high-pitched sounds. A rising whistle appears as a diagonal line that starts low and moves upward. A flat whistle at constant pitch appears as a horizontal line.
Most passerine songs in Canada fall within the 2–8 kHz range. Low-frequency hums or drumming from woodpeckers appear closer to the bottom of a standard spectrogram window.
What Different Mark Shapes Mean
The shape of each mark in a spectrogram corresponds directly to a perceptual quality of the sound:
| Spectrogram Shape | What It Sounds Like | Example Behaviour |
|---|---|---|
| Horizontal line (flat) | Pure, sustained tone at constant pitch | Introductory whistle of White-throated Sparrow |
| Rising diagonal | Upward whistle or "wheep" sound | Eastern Wood-Pewee call |
| Falling diagonal | Downward slur | Veery's descending spiral song |
| Rapid vertical marks (pulses) | Buzzy or rough texture | Common Yellowthroat "witchety" phrase |
| Stacked horizontal bands | Harmonic chord or buzzy whistle | American Robin's caroling note |
| Rapid repetition of identical marks | Trill — a rapid series of similar notes | Dark-eyed Junco's musical trill |
Amplitude and Mark Darkness
In most spectrogram displays, louder frequencies appear darker or more saturated. Faint marks near the edges of a call represent harmonics or overtones that carry less energy. The darkest, most prominent band in a bird's vocalization is usually the fundamental frequency — the pitch you perceive most clearly when listening.
When a bird sing at full volume, the marks on a spectrogram are bold and well-defined. Soft calls or distance-attenuated recordings produce fainter, sometimes fragmentary marks that are harder to interpret.
Frequency Range as a Field Clue
The vertical position of a spectrogram mark tells you roughly what pitch register a bird is using. This is practically useful even without precise frequency measurements:
- Songs concentrated below 3 kHz tend to sound low and flute-like (thrushes, some sparrows).
- Songs between 4–7 kHz are in the range most humans hear clearly and include many warblers and vireos.
- Songs above 8 kHz are high-pitched and often sound thin or thin and insect-like to older listeners who have experienced high-frequency hearing loss.
Song Sparrow (Melospiza melodia) in full song. Song Sparrows produce complex, individually variable songs that are recognizable on a spectrogram by a characteristic introductory series of repeated notes followed by trills. Source: Wikimedia Commons / CC BY 2.0.
How Spectrogram Software Works
Spectrogram analysis tools (such as Macaulay Library playback tools, Raven Lite from Cornell, or Audacity's built-in spectrogram view) apply a mathematical operation called the Short-Time Fourier Transform (STFT) to an audio recording. The audio is divided into short overlapping windows, and the frequency content of each window is computed and plotted.
The result is that each vertical column of pixels in the spectrogram represents the frequency distribution of a brief moment in time. How fine or coarse that time slice is — called the window size or FFT size — determines a trade-off between time resolution and frequency resolution. A larger window gives sharper frequency detail but blurs short events in time; a smaller window captures rapid changes but smears frequency detail.
For passerine songs with many rapid note changes, a smaller FFT window (e.g. 512 samples at 44.1 kHz) often gives clearer results. For long sustained tones or slower calls, a larger window (e.g. 2048 samples) resolves frequency more precisely.
Practical Steps for Reading a Spectrogram
- Orient yourself. Locate the frequency scale (vertical axis) and confirm the range shown — typically 0–10 kHz for passerines.
- Find the dominant band. Identify the darkest, most prominent marks. This is the fundamental frequency of the main call.
- Trace the shape. Follow each mark to see whether it is rising, falling, flat, pulsed, or curved. Each shape corresponds to a perceptual quality.
- Note the duration. Short burst of marks indicates a brief call note. A series of marks spread over 2–4 seconds indicates a song phrase.
- Look for repetition patterns. Does the same shape repeat at regular intervals? This points toward trills, repeated calls, or paired phrases.
- Cross-reference. Compare the spectrogram shape with reference recordings from databases such as Xeno-canto or the Cornell Lab of Ornithology.