For my PhD thesis, I explored unsupervised feature learning and deep learning for musical data and applied them to music classification, annotation and transcription.
This presents an effective data processing
pipeline to find feature representations from musical signals
using unsupervised learning algorithms. This learning-based feature representations
were utilized in music genre classification, annotation (a.k.a tagging) and text-based retrieval tasks.
A real-time music tagging demo that shows data/feature representations as well as estimated tags
that describes music in semantic levels (e.g. genre, emotion, instrument, voice quality and other semantic words)
We applied deep learning to polyphonic piano transcription. Based on a classification approach
(i.e. a binary classifier determines the presence of a single note), we pre-trained a neural
network in an unsupervised way and then fine-tuned the network using the errors which are
simultaneously propagated from multiple note classifiers.
I spent some time working on new sound representation and recognition methods using probabilistic latent component analysis (PLCA), which is a probabilistic version of non-negative matrix factorization (NMF).
This presents a method for recognizing sound sources in a mixture.
The method is based on a source separation idea using the PLCA but estimates relative proportions of the mixture
without separating the sources.
The short-time Fourier transform (STFT) based spectrogram intrinsically has a trade-off
in time and frequency resolution when it is displayed as a 2-D time-frequency representation.
We propose an idea to achieve high resolutions in both time and frequency using PLCA.
Sound synthesis is my longtime favorite. I love to learn and explore various sound synthesis techniques and sound design using them. This introduces my research work that I spent quite a while.
With Vesa Välimäki and
Jussi Pekonen at Aalto University,
I came up with several different algorithms to efficiently generate bandlimited (anti-aliasing) oscillators,
which are used in virtual analog synthesizers. This briefly introduces a class of algorithms among them.
This shows a 3-D sound processing demo working in real-time.
It uses head-related transfer functions (HRTFs) to synthesize binaural sounds
from monaural sources. We measured the HRTFs from human subjects and
efficiently modeled them using their minimum-phase characteristic.