Lab 5

Matlab/Python: Rudimentary Pitch Tracker

For your final Matlab assignment, we’re going to build a very basic pitch-tracker / resynthesizer. Our goal is to be able to take a monophonic (one note at a time) input file and generate a digital version that copies the original.

We will do this all in a single function, which will process the input signal in chunks, or blocks, and write to output a sawtoothTone that follows the loudest frequency in each block:

[output] = trackPitch(input, fs, windowSize)
input is the original source as a vector
fs is the original sample rate
windowSize is the size of the blocks we’ll break the input into to analyze and resynthesize, in samples. This is sometimes also called ‘blocksize’ or ‘hopsize.’ The value you choose affects the resolution of the spectrum you can generate - the larger, the more accurately you can track the correct frequency. However, it also determines the temporal resolution of the resynthesized output - the pitch of the output will only change each windowSize samples.

These are the steps you will need to follow to build your trackPitch() function:

  • Create a for loop that repeats floor(length(input)/windowSize) times (we won’t worry about any extra samples we might be missing at the end).
  • For each iteration of the for loop, we need to get a block of the input signal. For example, for the first repetition we want block = input(1:windowSize) in Matlab or block = input[0:windowSize] in Python, for the second iteration we want block = input(windowSize+1:2 * windowSize) in Matlab or block = input[windowSize:2 * windowSize] in Python, and so forth. How can we use our for loop to get the right block based on what number iteration we are on?
  • Next, we use our getSpectrum() function from lab 3 on that block. We want to get the loudest frequency in that block, so use [M,I] = max(Y), where Y is the Y returned from getSpectrum(). The I value will be the index of the biggest value in Y, so we are looking for the corresponding value in the Ffrom getSpectrum(), which will be F(I).
  • Generate one block of your output sawtooth signal for the current iteration of your for loop. There are two ways you can do this:
    • The easier (albeit slower and less correct) way to do this is to append a sawtooth at the frequency we found to output using the [A,B] notation in Matlab or the A.append(B) notation in Python, which will concatenate vectors A and B. You’ll need to create an empty output vector before the for loop with output = []. Then we can recursively append a sawtooth to our output on each iteration of the for loop by typing output = [output, sawtoothTone(fs, F(I), 0, windowSize/fs, 8)]; or output.append(sawtoothTone(fs, F(I), 0, windowSize/fs, 8)) (Here I’ve used 8 harmonics in my sawtooth tone, but feel free to use however many you want).
    • The cleaner and faster way to do this is to create the correct length output vector before your for loop (hint: use zeros() or np.zeros()), and then use the same indexing within your for loop that you use to access samples in the input to assign the sawtoothTone() you generate for that block to your output.
  • Either way, you should end up with a continuous sawtooth wave output that follows the pitch of your original input.

Once you’re done with your trackPitch() function, make a short recording (10-15 seconds) of yourself singing or playing a monophonic instrument (using Garageband, Audacity, or whatever you wish, reach out to Camille if you need help doing this) and use trackPitch() to generate a digital sawtooth version. Experiment with different power-of-2 values for windowSize to see what gives you the best balance of frequency and time resolution.

Matlab/Python Deliverables:

Submit your trackPitch() function along with your original recorded .wav file and generated digital sawtooth version. Report the value you used for windowSize for your final output in a comment in your function.

See Canvas for due dates.

Game Development: Biofeedback and Alternate Controls in Mixed Reality

Develop a game in AR or VR using some alternate form of control, especially types of biofeedback. Since we do not have access to the sensors at CCRMA this quarter, either make use of sensors you have such as fitness sensors or draw on the alternate controllers like those you used in Lab 3. You may either build on the VR/AR games you built for Lab 4 or develop an entirely new game for this assignment; just be sure to close the loop in the AR/VR environment.

If you are using hardware, you will need to make sure your Unity game can receive data from it. Each manufacturer has a different architecture for sending and receiving data to and from external hardware which can vary in their integration with Unity's development pipeline.

For software, asset packages such as the RAGE Project's EmotionDetectionAsset and ARKit's eye tracking feature (implemented in AR Foundation) are excellent options that make good use of face tracking. You can also make creative use of other data sources within your smartphones. However, for these approaches, keep in mind that your control mechanism should include some physical manifestation of a cognitive process. Think about what data you can collect to determine someone's focus, emotional state, or stress level, or how to design a control system to manipulate those affective states in your players.

Also, now that we’ve learned about spectrums in Matlab, you can also leverage this knowledge in Unity. You can call GetSpectrumData() on any AudioSource object to get frequency / spectrum information from that source’s audio stream for the current game frame. Rather than return a vector of values, however, GetSpectrumData() populates an array that you’ll need to pre-allocate before calling, which has to have a length that is a power of 2 between 64 to 8192 inclusive. Syntax for allocating a float array of length 256 in C# looks like this:
float [] myArray = new float [256];
It’s best to allocate the array once - declare it with your script variables and allocate it in Awake() or Start(). You can then populate your array using GetSpectrumData() like this:
mySource.GetSpectrumData(myArray, 0, FFTWindow.Hanning);
0 is the channel - could also be 1 for stereo sources but 0 is safest. The third argument is the window applied to our time samples, which we haven’t talked about. You can use any of the other window types you like, but will probably get the best results with the Hanning or Blackman windows. Check out the latest Canvas module for a C# example script demonstrating how to use GetSpectrumData() and build a corresponding frequency vector to do real-time analysis of your game's sounds.

Game Design Deliverables:

Submit links to your pitch and playtest videos (details on the Lab Overview page if you’ve forgotten). You do not need to submit the actual Unity project. Be aware that screen capture may not be the best option for this Lab.

See Canvas for due dates. We will play your games in class that day. Please make sure your hardware/software is ready to go for live Gameplay.

Lecture

Fridays, 9:45 - 11:45 AM PM
CCRMA Classroom (Knoll 217)

Lab

Tuesdays, 6:00 - 7:50 PM
CCRMA Classroom (Knoll 217)

Office Hours

See Canvas
CCRMA Classroom/Discord

Questions

Post on Discord or Email/p>

Instructors

Poppy Crum
Instructor
poppy(at)stanford(dot)edu

Lloyd May
Teaching Assistant
lloydmay(at)stanford(dot)edu