Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Learning Audio Embeddings: From Signal Representation, Audio Transformation to Understanding

Date: 
Fri, 05/31/2019 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Prateek Verma will lead a discussion about using embedding spaces, trained using deep neural networks (DNNs), to model and perform amazing feats with music and speech signals. This continues the theme from last week’s Hearing Seminar, where Rohit talked about using DNNs for speech recognition.

One common technique in the DNN world is to use a deep network to learn some task, and then take the output from an intermediate layer to help guide a new task. This intermediate representation, relatively low-dimensional, is called an embedding, and contains all the information needed to perform the task. Prateek will talk about using this new type of representation for supervised/unsupervised audio transforms, speech recognition, emotion recognition, translation, and end-to-end spoken language translation

Who: Prateek Verma (Stanford)
What: Embedding spaces for audio analysis (emotion recognition, genre classification and speech translation)
When: 10:30AM on Friday May 31, 2019
Where: CCRMA Seminar Room, Top floor of the Knoll at Stanford
Why: DNNs are really good at summarizing the world, what can they do for audio?

Bring your favorite DNN to the Hearing Seminar and we’ll talk about how they represent knowledge.

- Malcolm

Title:
Learning Audio Embeddings: From Signal Representation, Audio Transformation to Understanding

Abstract:
The advent of machine learning has brought a radical shift in the approaches for classical signal processing problems and audio processing. One of them is the rise of “new representations” or embeddings which have been successful in abstracting the information of interest. Embeddings are low dimensional vector representations mapped from the signal of interest (images, text, audio, etc.) via techniques in machine learning, linear algebra and optimization. In this talk, we would highlight ways in which these representations or embeddings can be computed, interpreted and used for tasks in music and audio signals.

We will discuss how can we create alternative representations, similar to the family of Fourier/Correlation based representations (Spectrograms, Constant-Q, correlogram) via learning and stacking these embedding vectors. For applications in supervised/unsupervised audio transforms, speech recognition etc, we show how these embeddings are computed, analysed and how do they help in solving the problem of interest. We will show how these embedding vectors can summarize different attributes of the input signal both at micro and macro level like pitch, timbre, rhythm, emotions, spectral comb structure etc. We will discuss how these fundamental characteristics of audio signals were never explicitly trained, yet they somehow are encoded and implicitly learned in these embeddings depending on the application of interest.

This work was done jointly with Jonathan Berger, Albert Haque, Michelle Guo, Chris Chafe, Julius Smith and Alexandre Alahi at Stanford University.

Bio:
Prateek Verma is a Stanford CCRMA graduate interested in the intersection of machine learning, audio processing and optimization for music and audio signals. Before coming to Stanford, he graduated from IIT Bombay in Electrical Engineering with specialization in Signal Processing. He has held research positions at Stanford in Artificial Intelligence Lab in the Computer Science Department in the Natural Language Processing Group as well as the Machine Learning group. At Stanford, he has taught in the inaugural course on “Deep Learning for Music and Audio” with Julius Smith giving several lectures, and has given a guest lecture in the signal processing course in the Electrical Engineering Department. He is continuing his research at Stanford in areas of hearing perception, unsupervised learning, sound analysis and synthesis.
FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2023
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Fall Courses at CCRMA

Music 101 Introduction to Creating Electronic Sounds
Music 192A Foundations in Sound Recording Technology
Music 201 CCRMA Colloquium
Music 220A Foundations of Computer-Generated Sound
Music 223A Composing Electronic Sound Poetry
Music 256A Music, Computing, and Design I: Software Paradigms for Computer Music
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320 Introduction to Audio Signal Processing
Music 351A Research Seminar in Music Perception and Cognition I
Music 423 Graduate Research in Music Technology
Music 451A Auditory EEG Research I

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Stanford Digital Accessibility
Web Issues: webteam@ccrma
site copyright © 2009-2023
Stanford University

site design: 
Linnea A. Williams