Difference between revisions of "Evolutionary Music"

From CCRMA Wiki
Jump to: navigation, search
Line 56: Line 56:
  
 
-[https://towardsdatascience.com/an-illustrative-introduction-to-dynamic-time-warping-36aa98513b98 Dynamic time warping]
 
-[https://towardsdatascience.com/an-illustrative-introduction-to-dynamic-time-warping-36aa98513b98 Dynamic time warping]
 +
 +
-[https://github.com/magenta/ddsp/blob/master/ddsp/losses.py Magenta's Spectral Loss]

Revision as of 16:55, 6 May 2021

Week 2 Update

This week was mostly about settling on a feasible and well-defined project. I'm interested in both evolutionary models and birdsong, so I've been leaning towards something that can combine both. The tough thing when it comes to designing an evolutionary model of music production is coming up with a fitness function. A genetic algorithm needs access to some way to map a genotype to a fitness value, and it's not clear how this can be done when a genotype is a piece of music. What makes good music? Between two snippets, how can you decide which is better?

Well, one thing you can do is ask how similar the piece of music is to pieces of music you know are good! There are more quantitative metrics of similarity than there are of quality. But rather than measure similarity directly, you could also get to it by asking how easy it is to tell the generated piece from one of the good pieces. This is the logic behind GANs, or generative adversarial networks. A GAN consists of two systems: a generator and a discriminator. The job of the generator is to take a set of data and try to produce more examples of that dataset. For instance, you might feed the generator a punch of Picasso paintings and ask it to generate more. The job of the discriminator is to take one of the generator's outputs alongside an example from the original dataset and try to determine which is which. The two systems encourage each other to get better, and eventually the goal is to have the performance of the discriminator fall to just random chance. At that point, there's no way to tell (from the computer's perspective, at least) the generated outputs from the original dataset. The canonical GAN is used on images, but Stanford's very own Chris Donahue helped develop the WaveGAN, which works over raw waveforms.

I'm interested in seeing if I can get WaveGAN to work for birdsong. I'm also interested in seeing if evolving the weights of WaveGAN (using the discriminator error rate as the fitness function) can achieve comparable results to the canonical learning algorithm, but given the finicky-ness of GAN training, that might need to be a project for another day.

In the next week I'm hoping to lock down a dataset of birdsong (there seem to be multiple options!) and start digging into the WaveGAN paper. I haven't ever worked with GANs before, so I'm looking forward to it!

Week 3 Update

Just starting off the week with a bunch of links related to birdsong that I'm following.

-A paper on an evolutionary model of birdsong

-A paper modeling syringeal vibrations in songbirds

-An old MUSIC220A song on birdsong

-A pitch tracing application (for cleaning up birdsong field recordings)

-Bill Schottstaedt's page on music generation (including birdsong)

-A Csound synthesizer for birdsong

-Csound python examples

-Matlab implementation of birdsong simulation

-Snd homepage

-Synthetic bird sounds dataset

As should be clear from the mountain of links immediately above, I spent the majority of Week 3 doing research. After reading through the WaveGAN paper, I realized that it wouldn't be the best fit for the project. WaveGAN outputs actual waveforms, so in order to remain computationally feasible it's capped at 1 second of audio, which isn't enough to simualte some of the more interesting bird calls. Not to mention, the authors already tested their algorithm on birdsong!

So I started looking for some kind of a synthesizer that would (ideally) have a small number of parameters. This took an unfortunately large amount of time. I kept bouncing back and forth between a paper I'd found on evolving birdsong that lacked any sort of documentation except for a pointer to an outdated Csound synthesizer, an implementation in Matlab using differential equations that I couldn't really parse, and the work of CCRMA's own Bill Schoedstadt from a number of years ago. It took some doing to get Scheme successfully installed onto my laptop, but ultimately I got Bill's configuration to work! Now comes the hard part: digging into the code to understand which parameters do what, and how each species of bird could be rendered as some kind of computational genotype.

Also of note, my dad pointed me towards a paper on latent variable evolution which seems super relevant to my project and was written by my soon-to-be-mentor at NYU. It's a small world!

Week 4 Update

This week was spent familiarizing myself with Scheme (in general) and Bill Schoedstadt's code (in specific). Unfortunately there isn't much to show or write about this, but I do want to shout out to Chris Chafe for walking me through some code line by line!

In terms of how to render the birdsong as a genotype, I'm again running into a bit of a snag. Bill's bird synthesizers, while amazingly high-fidelity, are extremely heterogenous. Very little code is re-used from one synthesizer to the next, making it very difficult to decompose the code enough to use genetic programming. So at the moment I'm leaning a bit towards using a simple genetic algorithm to output parameters to replace the defaults in the synthesizer. But this runs into another problem! The set of parameters that make a good fit is extremely dependent on the specific synthesizer into which they're inputted: a set of parameters that sounds great as a loon could sound awful as a robin. This makes evaluating the fitness of any given genotype very challenging. So to combat this I'm considering using a clustering approach to first bucket the 150-or-so bird recordings into a few manageable buckets. That way, hopefully the genetic algorithm can be reasonably certain that it's parameters will sound good for any synthesizer in the cluster. But to be honest, I have no idea if it will work!


Week 5 Update

Some good progress this week! I continued a bit down the path of birdsong clustering and wrote my first piece of real Scheme code (woot woot woot!) which collected each of the birdsong recordings in their own .wav file. After this, I used the tools from pyAudioAnalysis to extract the chromagrams and used that information to do clustering on the songs. This technically worked, in the sense that it produced valid output, but it certainly didn't produce any coherent clusters. After starting to think a bit about how to improve the clustering (perhaps using other features like MFCC, or some kind of temporal convolution?) I realized that this might be going about the problem the wrong way. Instead of trying to cluster 150 bird recordings into 20 groups and then trying to extract some kind of common synthesizer architecture from each cluster, why don't I instead simply fix the synthesizer architecture and leave only the parameters up to the GA? This struck me as a much more efficient way to go about things, so I changed tactic. Now, instead of explicitly using the GA to produce bird-like calls that did not align exactly with existing calls, the challenge for the algorithm is to reproduce an existing bird call using only a fixed and rather limited audio setup (namely, a single polywave oscillator, with the amplitude envelope, frequency envelope, and partials left up to the GA). I wrote some more Scheme code to get this working (well, mostly I just pared down one of Bil's existing synthesizers) as well as a simple python script to fill in the aforementioned parameters. At this point, I could generate a random genotype (read: list of 91 floating point numbers) and convert it into a "bird call." Given the random nature of the input, the output was basically just a warbled screech. But it's a start! I also had some code for evolutionary optimization lying around from a previous project, so all I needed to get that working would be a fitness function -- a way to evaluate how "good" a given genotype is. I whipped one together by using the same set of audio features I used for clustering and defining fitness as the inverse of the sum of the distance between the output and a target birdcall for each of the features. This... so far has not produced good outputs. I've only let the algorithm run for about an hour, so it's possible with more time something better might emerge, but I also think the fitness function is a little sketch, so I'm looking for a better way to compare two audio files. I've attached some links from my searching below:

-Phonological Corpus Tools (maybe better suited for speech)

-Paper on comparing speech, cited by PCT

-Comparing songs with Siamese Neural Networks

-Dynamic time warping

-Magenta's Spectral Loss