d_Jai is a machine learning-powered audiovisual DJing tool that reinterprets DJing's core technique of mixing as the combination and manipulation of musical features instead of waveforms. Trained on 9 hours of downtempo house music, the model driving d_Jai breaks down each deck into its salient components and embeds them in a high-dimensional intermediary representation called latent space. This tool provides a richly visual and multimodal means of interacting with this complex, high-dimensional latent space, enabling a new means of DJing that let you combine features of three separate tracks, smoothly manipulate style, and more!
Using a custom ChuGin to render models trained using Realtime Audio Variational autoEncoder (RAVE) framework, this tool allows the generation of audio in real time, with several different interfaces for manipulating latent space in unique and rich ways.
d_Jai has four components: the latent space visualizer, the interpolation grid, the style sphere, noise, and the exaggeration slider.
The Latent Space Visualizer is a visualization of the latent space. It is an 8-dimensional vector. This vector is encoded from the incoming audio data, manipulated using d_Jai and then decoded back into audio. As you interface with d_Jai you'll see the shape of the latent space morph depending on what you are doing.
The Intepolation Grid is your main mode of interaction. It allows you to interpolate between three different tracks. As you move the dot through the square, different amounts of the three tracks will be incorporated into the output. You dynamically change this balance to mix tracks.
The Style Sphere lets you navigate style! Navigating the 3D sphere by holding left shift and then moving the mouse, plus using the scroll wheel to change the magnitude of the change lets you shift the style of the output (by adding values to the latent space).
The Exaggeration Slider lets you control intensity. This slider mulitplies the latent values by a scalar. At the center is 1 where there is no change, the left goes to zero where the audio qualities are subdued. Going to the right exaggerates the latent points, flying into far off distances in latent space.
Noise! Hold n for noise!
This tool is for Windows.
After downloading d_Jai, you will need to launch two programs:
DJai.exe
in the root directory
and a ChucK program (to generate the audio) in the
DJai_release/
directory.
To start ChucK, open powershell and cd to ./ChucK/chugins/
. After
this run the following command:
../chuck.exe --srate44100 --chugin:./rave.chug ../../DJai_Data/StreamingAssets/dj.ck
You should now be hearing sound and can start manipulating latent space!
The tracks used:
Rey & Kjavik - Baba City (Rkadash Version)
The main workflow here is to take two tracks, break them down into their latent values, and then manipulated these latent values to do mixing and djing! This milestone has a very basic working example with two types of manipulation: interpolation & exaggeration!