Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Learning Audio Embeddings: From Signal Representation, Audio Transformation to Understanding

Date: 
Fri, 05/31/2019 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Prateek Verma will lead a discussion about using embedding spaces, trained using deep neural networks (DNNs), to model and perform amazing feats with music and speech signals. This continues the theme from last week’s Hearing Seminar, where Rohit talked about using DNNs for speech recognition.

One common technique in the DNN world is to use a deep network to learn some task, and then take the output from an intermediate layer to help guide a new task. This intermediate representation, relatively low-dimensional, is called an embedding, and contains all the information needed to perform the task. Prateek will talk about using this new type of representation for supervised/unsupervised audio transforms, speech recognition, emotion recognition, translation, and end-to-end spoken language translation

Who: Prateek Verma (Stanford)
What: Embedding spaces for audio analysis (emotion recognition, genre classification and speech translation)
When: 10:30AM on Friday May 31, 2019
Where: CCRMA Seminar Room, Top floor of the Knoll at Stanford
Why: DNNs are really good at summarizing the world, what can they do for audio?

Bring your favorite DNN to the Hearing Seminar and we’ll talk about how they represent knowledge.

- Malcolm

Title:
Learning Audio Embeddings: From Signal Representation, Audio Transformation to Understanding

Abstract:
The advent of machine learning has brought a radical shift in the approaches for classical signal processing problems and audio processing. One of them is the rise of “new representations” or embeddings which have been successful in abstracting the information of interest. Embeddings are low dimensional vector representations mapped from the signal of interest (images, text, audio, etc.) via techniques in machine learning, linear algebra and optimization. In this talk, we would highlight ways in which these representations or embeddings can be computed, interpreted and used for tasks in music and audio signals.

We will discuss how can we create alternative representations, similar to the family of Fourier/Correlation based representations (Spectrograms, Constant-Q, correlogram) via learning and stacking these embedding vectors. For applications in supervised/unsupervised audio transforms, speech recognition etc, we show how these embeddings are computed, analysed and how do they help in solving the problem of interest. We will show how these embedding vectors can summarize different attributes of the input signal both at micro and macro level like pitch, timbre, rhythm, emotions, spectral comb structure etc. We will discuss how these fundamental characteristics of audio signals were never explicitly trained, yet they somehow are encoded and implicitly learned in these embeddings depending on the application of interest.

This work was done jointly with Jonathan Berger, Albert Haque, Michelle Guo, Chris Chafe, Julius Smith and Alexandre Alahi at Stanford University.

Bio:
Prateek Verma is a Stanford CCRMA graduate interested in the intersection of machine learning, audio processing and optimization for music and audio signals. Before coming to Stanford, he graduated from IIT Bombay in Electrical Engineering with specialization in Signal Processing. He has held research positions at Stanford in Artificial Intelligence Lab in the Computer Science Department in the Natural Language Processing Group as well as the Machine Learning group. At Stanford, he has taught in the inaugural course on “Deep Learning for Music and Audio” with Julius Smith giving several lectures, and has given a guest lecture in the signal processing course in the Electrical Engineering Department. He is continuing his research at Stanford in areas of hearing perception, unsupervised learning, sound analysis and synthesis.
FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2020
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Winter Quarter 2021

1A Music, Mind, & Human Behavior
101
 
Introduction to Creating Electronic Sound
220B Compositional Algorithms, Psychoacoustics, and Computational Music
222 Sound in Space
250C Interaction - Intermedia - Immersion
251 Psychophysics and Music Cognition
253 Symbolic Musical Information
285 Intermedia Lab
320B Introduction to Audio Signal Processing Part II: Digital Filters
422 Perceptual Audio Coding
451A Basics in Auditory and Music Neuroscience
 

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams