Super-Resolution Spectrogram using coupled PLCA


Introduction

The short-time Fourier transform (STFT) based spectrogram is commonly used to analyze the time-frequency content of a signal. Depending on window size, the STFT provides a trade-off between time and frequency resolutions. This paper presents a novel method that achieves high resolution simultaneously in both time and frequency. We extend Probabilistic Latent Component Analysis (PLCA) to jointly decompose two spectrograms, one with a high time resolution and one with a high frequency resolution. Using this decomposition, a new spectrogram, maintaining high resolution in both time and frequency, is constructed. Termed the “super-resolution spectrogram”, it can be particularly useful for speech as it can simultaneously resolve both glottal pulses and individual harmonics.


Examples

Here a toy example. This is a sinusoid signal mixed with an impulse. The regular STFTs look like this. The left one has a high time resolution (with a short window) and the right has a high frequency resolution (with a long window)

Using the coupled PLCA that takes the two STFTs as inputs, we can achieve high resolutions in both time and frequency.

You may think that you could get a similar result by multiplying the two STFT spectrograms elementwise. However, it is not as clean as the one above, in particular, has smearing at the cross section.

Here are another toy examples - linear and logarithmic chirps. You can see that the width of the sinusoidal change is consistently narrow in the super-resolution spectrograms.


Linear Chirp


Logarithmic Chirp

Here are examples from real-world signals.


Male Speech


Glockenspiel


Piano


Code

Here are Matlab files that contain coupled PLCA code and examples: super_spec.zip


Publications