Speech Denoising with Deep Feature Losses (arXiv, Github page)

François G. Germain, Qifeng Chen and Vladlen Koltun

Please use any browser BUT Internet Explorer to play the files.

Algorithms:
- Noisy: Input speech file degraded by background noise.
- Our approach: Speech file processed with our fully convolutional context aggregation stack trained with a deep feature loss.
- Wiener: Speech file processed with Wiener filtering with a priori signal-to-noise ratio estimation (Hu and Loizou, 2006).
- Wavenet: Speech file processed with the Wavenet-like speech enhancement deep network (Rethage et al., 2018).
- SEGAN: Speech file processed with the SEGAN speech enhancement deep network (Pascual et al., 2017).


=> Select BAK tranche:

Tranche 1Tranche 2Tranche 3Tranche 4Tranche 5Tranche 6Tranche 7Tranche 8
(Hardest/Noisiest)============================>(Easiest/Cleanest)

Audio samples from tranche 6

Filename Noisy Our approach Wiener Wavenet SEGAN
p257_364.wav
p257_245.wav
p232_022.wav
p232_152.wav
p257_157.wav


Data source: Test dataset from University of Edinburgh noisy dataset (Valentini-Botinhao et al., 2016).

References

Y. Hu and P. C. Loizou, "Subjective comparison of speech enhancement algorithms," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006.
S. Pascual, A. Bonafonte, and J. Serrà, "SEGAN: Speech enhancement generative adversarial network," in Interspeech, 2017.
D. Rethage, J. Pons, and X. Serra, "A wavenet for speech denoising," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, "Investigating RNN-based speech enhancement methods for noise-robust text-to-speech," in ISCA Speech Synthesis Workshop, 2016.