Pianote - An Audio TV Remote

An unnecessary piano remote that uses sound to change the channel

Try Pianote: HERE!

"Escape From the Turing Trap" assignment: here


Description:

Pianote is what it sounds like - a piano remote... It uses sound to control the TV. What TV you ask? Why of course the Pianote TV, a smart TV that processes audio into input commands! The TV features a microphone receiver created with Google's Teachable Machine, an interactive machine learning platform for training neural networks. Both the TV and remote run fully in the browser, powered by Tensorflow.js and WebChucK!

Pipeline:
  1. Pianote runs on a mobile device browser as a remote. Sounds were made and recorded in Ableton and re-synthesized with WebChucK.
  2. Pianote sounds are used to train a Teachable Machine model for audio classification
  3. Pianote TV runs in the browser recording microphone input. Teachable Machine and Tensorflow.js are used to process real-time audio data into spectrogram windows.
  4. The spectrogram windows are passed into the Teachable Machine Neural Network and classification is performed to parse for TV input commands:
    • 1-9 TV channels
    • TV On/Off
    • Mute/Unmute
  5. Javascript is used to dynamically switch and playback video elements
Reflection:

I almost lost my mind making this final project. At first, I started this as a simple WebChuck project that would bring more control to my past project, SoundscapeAI. I wanted to make a similar system that would allow a user to play along with the system to control how it would sound. I wanted to make it so that not only a single user could influence the system, but a number of users could simultaneously influence the system, using pianote sound remotes. This meant that the "TV" needed to be able to handle multiple inputs as well as be robust to noise. Using Teachable Machine allowed me to use interactive ML to train a model to recognize these sounds but I quickly realized that AI was really inaccurate, not confident, unreliable, and sequential, unable to handle multiple inputs. In trying to build this system, the accuracy and precision of audio detection and classification just wasn't high enough. I had made the remote with ease, but the TV was impossible to get working.

I had to scrap the compositions that I planned, the additional WebChucK interactions that I had built. I ended up just using Teachable Machine's Tensorflow.js model to control HTML video which was fairly easy to do. In the end, I decided to just roll with a simple idea, an audio remote and TV. Even then, using this system was really difficult because the TV was listening for audio and the TV was playing back audio. For classification to happen accurately, I almost had to retrain my model every time that I wanted to use it. It's not the most robust thing, but it was kind of fun to make. It's super responsive when it works, otherwise, it just picks up a lot of noise. And that's a tiny bit of the fun in using it too.

I think that using AI in this project definitely overcomplicated the system. But at the same time, that's what makes it so good and unreliable. I think that there's promise that one day, a system like this would be able to be controlled from one collective wave of sound, and I really think that opens up a possibilty for co-creative music making, rather than just precise processing of web inputs. I really enjoyed that weird mappings that AI systems and non-traditional inputs and sensors can afford. Overall, I think Music and AI I'd reiterate that AI is not about what can be done, but what's worth doing. This critical lens really helps me to think about the systems that I make and how people use them.

Acknowledgements:

Special thanks to Ge Wang and Yikai Li for making this course possible and these projects so thought-provoking and openly expressive

Tooling: Teachable Machine

I would also like to acknowledge the YouTube videos that made this project possible:

To download the files for this project

Download Here






Milestone (3/14/2023)

Description:

Using sound to control your TV... or WebChucK at the moment. Parsing and recognizing sound from a voice remote to control another system. This current iteration isn't too good. It's code adapted from mosaic synth and continuously synthesizes windows which is too much atm. But communication via near field sound is possible and shows to be promising :).