***************************************************************** * DIANA * * Dynamic Interactive Audio and Noise Analyzer * * by Uri Nieto * * * * Homework #3: Sound Peeking * * * * Music 256a, Fall 2009 * * Stanford University * * * * README file * ***************************************************************** README Contents: - SYSTEM REQUIREMENTS - BUILDING THE PROJECT - USAGE - DESCRIPTION - IMPLEMENTATION ********************************************************* REQUIREMENTS -Mac OS X 10.4 or higher ********************************************************* BUILDING THE PROJECT From the directory where the files of the tgz are, type the following: make This should compile and generate the Diana binary. ********************************************************* USAGE To run Diana type (from the Diana directory): ./Diana These are the different possible options while running Diana: ---------------------------------------------------- 'h' - print help message 'f' - toggle fullscreen 'k' - toggle signal / keyboard mode 'CURSOR ARROWS' - rotate signal view 'q' - quit ---------------------------------------------------- ********************************************************* DESCRIPTION Diana is a small piece of software that analyzes noise and audio and displays it in a nice 3d window. It also estimates the pitch from a more or less stable sinusoidal signal and maps it to a midi keyboard key in the screen. Diana stands for Dynamic Interactive Audio and Noise Analyzer. It uses RtAudio for the input signal, chuck libraries for the FFT and OpenGL for the graphics. The code is based on Sound Peek by Ge Wang, Perry R. Cook, Ananya Misra http://soundlab.cs.princeton.edu/ At the moment, Diana only runs on Mac OS X 10.4 or higher. Diana has 2 different running modes: Signal and Keyboard - Signal Mode This is the default mode, and it prints 4 different signals on the screen, all of them being read from your default input. Top Green signal: Windowed Time-Domain Signal. The window used is a Hanning window. Middle Blue signal: Scrolling Time-Domain Signal. Bottom Orange signal: Waterfall Spectrum of the Windowed Time- Domain Signal. Background Blue signal: Time-Domain signal being rotated, and no window applied. Also called Speed of Light signal. You can rotate this view by pressing any cursor arrows. All signals will rotate except the one in the back, which will stay still creating this "speed of light" effect. - Keyboard Mode This mode shows a midi keyboard in the screen and it will estimate the pitch from the signal being read and map it to the correct keyboard key. There is a deviation error of +1/-1 semitone. To toggle between Signal and Keyboard mode, press 'k'. To toggle between Fullscreen mode press 'f'. By default, the Signal Mode is on and the fullscreen is off. ******************************************************** IMPLEMENTATION The tough part of the implementation is based on the graphics to display the signal read by the default input. However, it is important to remind how we read from the default input. We use RtAudio in order to do that, and in this case, the audio callback is a simple function that stores the input buffer into our buffer that will read our display function. The only "tricky" part is that we use the mutex options to lock the thread, so that it will be safe to read from that buffer in the display function. Once we have our buffer ready to be read by the display function, all the action is focused on this function. The display function is the callback function of the OpenGL library, and it will be called every time it needs to refresh the screen. We can divide the implementation of the display function into 5 different parts: Windowed Time-Domain, Rotating Time-Domain, Scrolling Time-Domain, Spectrum Waterfall, Pitch Detection. - Windowed Time-Domain In order to display the windowed time domain (the green signal in the top of the screen), we must apply the window first to our buffer where the signal is stored. We must first define our window type. This will only be done once, and that is why it is found in the main function. Here is the code to do that: // make the transform window hanning( g_window, g_buffer_size ); The hanning function is found in the chuck_fft code, extracted from sndpeek. The window type then, is the hanning window. Now that we have defined our window, we can apply it to our signal, this will be done in the display function, so every time the screen is refreshed: // apply the transform window apply_window( (float*)buffer, g_window, g_buffer_size ); This apply_window is also found in the chuck_fft code. Now we can display this buffer in the top of the screen using basic OpenGL syntax, applying the desired color. We will push the matrix so that we can safe the state and don't mess up with the rest of the signals. The function to draw the windowed time-domain signal is: void drawWindowedTimeDomain(SAMPLE *buffer); - Rotating Time-Domain This signal is the one in the background, the one that creates this sensation of "Speed of Light". In this case, we don't want to apply the window, so we will draw this signal before applying the window to our buffer. Moreover, since we want to keep this Rotating Time-Domain signal in the background without rotations, we will do it before saving the push matrix for rotations, so that it won't move when rotating the rest of the signals. In order to create this effect of "Speed of Light", we will rotate the z axis at a fast and slightly random speed. The line width is slightly higher and the signal is multiplied by a factor of 2 so that the effect is stronger. The function to draw the rotating time-domain signal is: void drawRotatingTimeDomain(SAMPLE *buffer); - Scrolling Time-Domain We will need a new buffer in order to store the information to be displayed in the screen. This buffer will be much bigger, and the size of it will determine how much of the previous samples we want to display. In our case, our buffer is: SAMPLE g_scroll_buffer[DNA_SCROLL_BUFFER_SIZE]; where: DNA_SCROLL_BUFFER_SIZE = DNA_BUFFER_SIZE*60 So it's 60 times bigger than the buffer size (which is 1024). This buffer will be a "circular" buffer, and to do so we will use to different indices: int g_scroll_reader; int g_scroll_writer; At every refresh of the screen, we will copy the input buffer into our scrolling buffer in the position where the g_scroll_writer says. This way we won't have to go through all the buffer in order to update the information displayed. Thus, is much more efficient. We will draw the scrolling time-domain using standard OpenGL. The code is found in this function: void drawScrollingTimeDomain(SAMPLE *buffer); - Spectrum Waterfall We will use a similar system to the one used in sound peek in order to implement the spectrum waterfall. We will first of all apply the FFT to our buffer. We will use the libraries from chuck_fft to do so: // take forward FFT; result in buffer as FFT_SIZE/2 complex values rfft( (float *)buffer, g_fft_size/2, FFT_FORWARD ); // cast to complex complex * cbuf = (complex *)buffer; We cast to complex to compute the spectrum in an easy way. We will make use of a global variable called g_spectrums, which is an array of spectrums. We will make use of Pt2D, which is a point in 2D with the following structure: struct Pt2D { float x; float y; }; So our g_spectrums is: Pt2D ** g_spectrums; We will first set our g_spectrums to zero, with a depth of 48. Then, in every refresh, we will read from the complex buffer and store the result into the correct position of our g_spectrums. We will gradually change the color in order to produce a nice visual effect of the waterfall. The code in order to draw the spectrum waterfall is found in: void drawSpectrumWt(complex *cbuf); - Pitch Detection The pitch detection will be done before any rotation, like the "Speed of Light", so that it won't rotate even though the rest of the signals are rotated. I used autocorrelation in order to obtain a better pitch estimation. Then, I read the autocorrelated buffer and interpolate the max peak and assign it to a real note. The real note has an error of +1/-1 semitone. This is done in the function: void getNote(double pitch); In order to show the Midi Keyboard, I used the Textures options from OpenGL. To load the textures, I implemented the following function: GLuint LoadTextureRAW( const char * filename, int wrap ); The whole implementation in order to draw the midi keyboard and get the pitch is found in: // Draw the Keyboard drawKeyboard(); // Get the Pitch and assign to keyboard key getPitchAndAssignToKeyboard(buffer); There is a small pitch stability algorithm that will only take in consideration the pitches that are constant in 4 continuous time-windowed buffers. This makes the pitch detection much more stable. ---- The hardest part of this assignment was the OpenGL part, since I hadn't played with it for a long time. Also took me some time to deal with the scrolling buffer, and the pitch detection. The other difficult part was to stay away from it, since in the end I considered this project my "little son" and I couldn't stop adding new features. ******************************************************** Have fun! uri urinieto@ccrma.stanford.edu CCRMA Stanford University 2009