Audio Style Transformations using Deep Neural Networks by Prateek Verma
Deep Neural Networks (DNNs) have been wildly successful for many tasks, but none of the tasks are as wondrous as the success that DNNs have had on transferring the style of one painting to another painter’s art. This magical trick is accomplished by mixing and matching the low-level feature analysis layers between different styles of paining, so a painting in one style is rendered in another with different kinds of brush works.
But can we do this for audio? Prateek Verma has been experimenting with this, and will talk about his work and results. What does audio style mean, and how does one capture it?
Who: Prateek Verma (Stanford CCRMA)
What: Audio Style Transformations using Deep Neural Networks
When: 10:30AM on Friday, January 12, 2018
Where: CCRMA Seminar Room
Why: Style transfer is cool and mysterious
This is the first Hearing Seminar of the new quarter. Bring your own style, and we’ll talk about how DNNs can change it.
- Malcolm
Audio Style Transformations using Deep Neural Networks
Prateek Verma and Julius Smith - Stanford CCRMA
There has been fascinating work on creating artistic transformations of images by Gatys et al. This was revolutionary in how we can in some sense alter the “style” of an image while generally preserving its “content”. In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest.
For demonstration, we investigate two different tasks, resulting in bandwidth expansion/compression, and timbral transfer from singing voice to musical instruments. A feature of our method is that a single architecture can generate these different audio-style-transfer types using the same set of parameters which otherwise require different complex hand-tuned diverse signal processing pipelines. We would also discuss very similar work published by Google Research on style transfer on speech, with content with words being spoken and style being the speaker. Finally to conclude, we would motivate plethora of applications possible with this framework, with simple tweaking of the loss functions. :)
Biography
Prateek Verma is a CCRMA MA/MST graduate interested in Audio Processing, Generation and Analysis. Before coming to Stanford, he graduated from IIT Bombay in Electrical Engineering. He has held research positions at Stanford in Artificial Intelligence Lab in the Computer Science Department.