I focused on starting to expand the tool beyond the celebrity face model I've been using for demo purposes. So far I've been able to expand the model loading to include the other models included in the pytorch_GAN_zoo github repo. This includes 4 models of different resolutions, and the output window now adjusts the resolution as needed to account for this. The goal for this is to expand to a large variety of models available on pytorch's hub, and to abstract things so that different models will have different classes and can do different things (like being able to use the style vectors in sytlegans).
This week I added sinusoidal modulation to the latent space. This an LFO like you would use on a tremolo or filter sweep. As seen above this effectively gives a sort of ornamentation to whatever base motion you're doing.
What an LFO is in higher-dimensional space is kind of a tricky thing to translate into math so that it's actually useable (which makes it a perfect thing to abstract away with the tool). In the 1d case (i.e. sound) it's a motion up and down, in the 2d case, it's a rotation of a circle. In the 3d case it's some sort of motion on a plane I guess? Either way, the problem is that each point in latent space is, well, a point. In order to oscillated around the point you need some concept of a direction in order to do so (which there are a lot of in high-dimensional space). When you're moving (like in interpolation) you don't necessarily want to be oscillation parallel to the direction of the interpolation.
Here's an updated example with the new interpolation (move smoothly from face A to B) feature.
I've got usable code! The above video was recorded running in real-time. Last week I had a really janky setup for displaying the generated image - it would only be able to run at ~1 frame every 3-5 seconds and didn't exit very gracefully (I basically had to kill it with task manager every time). Turns out matplotlib is not the best solution for trying to play video generated in real time, who would have thought?
After a whole lot of trail, error, and scouring codebases and watching videos, I was able to migrate the video dispaly to OpenGL using the PyOpenGL bindings. I still don't have a particularly firm understanding of how OpenGL code should be structured, but it works. And not only does it work, but it works at 60 fps! I think there's probably a lot of ways to improve performance (perhaps the biggest being, the images is generated on the gpu, it then gets copied back to the cpu to turn into a numpy array and then it gets copied back to the gpu to display), but the performance is solidly in the good enough territory right now so now I can focus on adding features.
One interesting thing I noticed was the performance of the GAN improved dramatically (almost 2 orders of magnitude) when generating frames at 60fps, as opposed to once every few seconds. I suppose this is probably due to some caching mechanism on the gpu that I don't know about/understand.
So far I've mostly been focused on thinking a lot about the architecture and interface of this project - right now it seems like the two big focuses should be on having the project distinguish between absolute points in latent space and the relative motion that needs to happen for animation to occur, and for animation to be constructed as the composition of simpler actions.
I think I'm at the point now where I want to start doing some quick prototyping to evalulate whether my current appraoch is working. As far as code goes, I've got a basic (but working) face generator: it downloads a pre-built model, and on an osc trigger it generates a random face and displays it with matplotlib.