The short-time Fourier transform (STFT) based spectrogram is commonly used to analyze the time-frequency content of a signal. Depending on window size, the STFT provides a trade-off between time and frequency resolutions. This paper presents a novel method that achieves high resolution simultaneously in both time and frequency. We extend Probabilistic Latent Component Analysis (PLCA) to jointly decompose two spectrograms, one with a high time resolution and one with a high frequency resolution. Using this decomposition, a new spectrogram, maintaining high resolution in both time and frequency, is constructed. Termed the “super-resolution spectrogram”, it can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.
Here a toy example. This is a sinusoid signal mixed with an impulse. The regular STFTs look like this. The left one has a high time resolution (with a short window) and the right has a high frequency resolution (with a long window)
Using the coupled PLCA that takes the two STFTs as inputs, we can achieve high resolutions in both time and frequency.
You may think that you could get a similar result by multiplying the two STFT spectrograms elementwise. However, it is not as clean as the one above, in particular, has smearing at the cross section.
Here are another toy examples - linear and logarithmic chirps. You can see that the width of the sinusoidal change is consistently narrow in the super-resolution spectrograms.
Linear Chirp
Logarithmic Chirp
Here are examples from real-world signals.
Male Speech
Glockenspiel
Piano
Here are Matlab files that contain coupled PLCA code and examples: super_spec.zip