A T S

Analysis - Transformation - Synthesis


Juan C. Pampin

Stanford University
Center for Computer Research
in Music and Acoustics (CCRMA)
juan@ccrma.stanford.edu
University of Washington
Center for Digital Arts
and Experimental Media
pampin@u.washington.edu

ATS is a software library of functions for spectral Analysis, Transformation, and Synthesis of sound based on a sinusoidal plus critical-band noise model. A sound in ATS is a symbolic object representing a spectral model that can be sculpted using a variety of transformation functions. Spectral data can be accessed trough an API, and saved to/loaded from disk. ATS is written in LISP, its analysis and synthesis algorithms are implemented using the CLM (Common Lisp Music) synthesis and sound processing language.


Table of Contents

Analysis
General Block Diagram
Peak Detection
Peak Tracking
Psychoacoustic Processing
Post-processing
Residual
Residual Analysis
Tracker Documentation
Examples
Trasnformation
Transformation Functions
API
Structure Slots
System Parameters
Saving and Loading Sounds
Synthesis
sin-synth
sin-noi-synth
Download ATS

The ATS project


* Analysis

Sound analysis in ATS is performed using the tracker  function. Partials are tracked using high-resolution sinusoidal analysis (see below). Tracked partials are then synthesized using phase information, and subtracted from the original sound to obtain a residual signal containing what couldn't be modeled by the sinusoidal analysis. The resulting residual is modeled as time-varying critical-band noise. This is performed by dividing the spectrum of the residual into 25 critical bands and computing the energy in each band at frame rate. Critical-band energy is then re-injected to partials present in those spectral regions as modulated narrow-band noise. A complementary model is used to keep noise energy from band regions where no partials were tracked.

General Block Diagram

The tracker algorithm implements high-resolution sinusoidal analysis suitable for both harmonic and non-harmonic sounds. The technique used for tracking partials is similar to the deterministic analysis of Xavier Serra's SMS (Spectral Modeling Synthesis). One difference with SMS's deterministic analysis is that tracker uses also psychoacoustic information to determine the salience of detected peaks. This information (measured as signal-to-mask ratio, or SMR) is derived from masking effects produced within critical bands, and accounts for the audibility of sinusoidal trajectories. To achieve coherent sinusoidal trajectories, both SMR and frequency deviation information are used to track partials across frames .

Tracker consists of five main modules. The windowing module breaks the analyzed sound into sort-time overlapping segments and applies an analysis window to the signal. Windows from the Blackman-Harris family are normally used but any other window type implemented by CLM can be used (see the documentation section below). The hop size (number of samples to skip by the analysis window) is expressed as a proportion of the window size (0.25 means 1/4 of the window). The size of the analysis window is calculated as a function of the number of cycles of the lowest frequency to be tracked by the system (usually the fundamental in the case of harmonic sounds). The size of the Fast Fourier Transform (FFT), used to compute the Short Time Fourier Transform (STFT) in the analysis, is internally calculated as the closest power of two greater than two windows, assuring enough zero padding. Both the window size (M) and the FFT size (N) can be forced to be any number of samples (the condition: M<=N has to be achieved, and N must be a power of 2).

Peak Detection

The spectrum issued from the Short Time Fourier Transform (STFT) of the windowed signal is converted to polar form to obtain the magnitude and phase of each bin. After this, peaks are detected along the dB magnitude spectrum. A peak is a local maximum in the spectrum defined as: |Xk-1| < |Xk| > |Xk+1| where Xk is the peak location and Xk-1, Xk+1 its surrounding bins. Once a peak is detected, its real amplitude, frequency, and phase values are obtained by means of parabolic interpolation. Only peaks with magnitude above an indicated threshold are kept in the analysis. For more details on these peak detection and interpolation techniques you can visit Julius Smith's web page on PARSHL.

Peak Tracking

Peaks detected in one analysis frame have to be integrated to sinusoidal trajectories. This is done in three steps: first, candidates to continue a particular trajectory are found in the peak pool of the new frame; second, the best candidate is found based on masking and frequency information. Trajectory's frequency and SMR are averaged across frames and these values, called tracks, are used to evaluate which peak of the pool better continues the trajectory. An adjustable number of frames is used to average track values, the latest peak incorporated to the trajectory can also be used (in a weighted fashion) to compute track parameters (this can be useful for tracking unstable sinusoids, see the documentation below). The best peak candidate will be the one with minimal SMR difference and frequency deviation from the track (the intervention of masking information in this process can be also weighted, see the documentation below). Taking two parameters into account, SMR and frequency deviation, practically eliminates conflicts between tracks (i.e. having more than one track claiming for the same peak). Finally, tracked peaks are incorporated to their sinusoidal trajectories, trajectories that didn't incorporate peaks in this frame are "turned off" (tracks keep their last values and wait for candidate peaks in subsequent frames), and peaks left over in the pool "start up" new trajectories and tracks.

Psychoacoustic Processing

Signal to Mask Ratio (SMR) information used during tracking is computed by the Psychoacoustic Processing module. The magnitude of the peaks present in a frame are used to evaluate a masking threshold across frequency. All peak frequencies are converted to Bark scale and linear masking curves traced up and down in frequency from each peak location (note the asymmetry between the left and right slope of the lines on the picture, the right slope is inversely proportional to the magnitude of the peak). After masking curves for all peaks are traced, they are combined to create a masking threshold across the 25 critical bands (from 1 to 25 Barks). Then SMR for each peak is computed as the ratio in dB between the peak's magnitude and the level of the masking threshold at the peak's location.

Post-processing

After partials were tracked, an ATS-SOUND structure is created to store the spectral data. In a post-processing stage, short trajectories with low SMR average value are removed and gaps in continuous trajectories fixed. Also in this step, frequency centroid and average SMR for each partial are computed and stored in the structure.

Residual

Once sinusoidal trajectories are fixed and stored, the residual is computed. This step is performed by re-synthesizing the tracked partials using phase information, and subtracting them from the original sound in the time domain. The resulting signal is called residual and contains what was left out by the tracking process, usually noise. A two-channel file with the re-synthesis and the residual is generated by the system, and can be used as an intuitive measure of the sinusoidal tracking quality. Normally, a noisy and low-energy residual is sign of successful tracking (see the documentation below).

Residual Analysis

After being computed, the residual is analyzed at frame rate using a sliding rectangular window and the STFT. The fequency spectrum of the residual is transformed to Bark scale and energy computed at each of the 25 critical bands. The residual's energy is then re-injected as modulated narrow-bandwidth noise to partials present at each sub-band of the spectrum. Band regions with significant energy where no partials were tracked are kept in a complementary model. Modulated critical-bandwidth noise is used to model residual energy in those remaining sub-bands regions.

Tracker Documentation

tracker file snd &key (start 0.0)(duration nil)(lowest-frequency 20)(highest-frequency 20000.0)(frequency-deviation 0.1)(window-cycles 4)(force-M NIL)(window-type 'blackman-harris-4-1)(force-window NIL)(hop-size 1/4)(fft-size nil)(lowest-magnitude (db-amp -60))(track-length 3)(last-peak-contribution 0.0)(SMR-continuity 1.0)(amp-threshold nil)(min-segment-length 3)(residual "ats-residual.snd")(verbose nil)

Examples

;;; clarinet analysis
(tracker (concatenate 'string *ats-snd-dir* "clarinet.aif")
	 'cl
	 :start 0.0
	 :hop-size 1/4
	 :lowest-frequency 100.0
	 :highest-frequency 20000.0
	 :frequency-deviation 0.05
	 :lowest-magnitude (db-amp -70)
	 :SMR-continuity 0.7
	 :track-length 6
	 :min-segment-length 3
	 :residual "/tmp/cl-res.snd"
	 :verbose nil)

;;; crotale analysis
(tracker (concatenate 'string *ats-snd-dir* "crt-cs6.snd") 
	 'crt-cs6
	 :start 0.1
	 :lowest-frequency 500.0
	 :highest-frequency 20000.0
	 :frequency-deviation 0.15
	 :window-cycles 4
	 :window-type 'blackman-harris-4-1
	 :hop-size 1/8
	 :lowest-magnitude (db-amp -90)
	 :amp-threshold -80
	 :track-length 6
	 :min-segment-length 3
	 :last-peak-contribution 0.5
	 :SMR-continuity 0.3
	 :residual "/tmp/crt-cs6-res.snd"
	 :verbose nil)

*Transformation

A sound in ATS is an intermediate representation of the spectral evolution of an analyzed signal. Users can manipulate the parameters of a sound to operate spectral transformations. Transformations can be destructive i.e. the original sound structure is changed, or generative i.e. the transformation is applied to a new instance of the sound, keeping the original untouched. On ATS-SOUND can cumulate several transformations before being re-synthesized.

Parameters passed to the transformation functions can be, most of the times, of any of this forms:

  1. A number: transformation is parallel and synchronous , this numeric value is used to transform all partials over all the time frames of the sound.
  2. A list of numbers: each partial is transformed with a different value. When the transformation is applied to the time data the sound structure is transformed diachronically.
  3. A list containing a CLM style envelope: transformation is dynamic.
  4. A list of envelopes: each partial is transformed with a different dynamic value. When the transformation is applied to the time data the sound structure is transformed diachronically.
  5. A list of numbers and envelopes.

Transformation Functions

API

Transformation functions are built using ATS's API. The API functions and macros make spectral data access easy for users to develop their own transformantion algorithms.

The following set of functions should be used to access data from sounds with non-linear time structure alterations (as the ones performed by stretch-sound using envelopes as stretch factor). This functions are less efficient than the previous ones because data has to be interpolated using time information instead of frame locations.

Structure Slots

The slots of an ATS-SOUND are accessible both in Lisp and in CLM's run-loop (see the Synthesis section below). Analysis information (such as number of frames and partials) is stored in the structure together with spectral data. To access slot values, Lisp accessor functions should be used. Accessor function names have the ats-sound- prefix followed by the name of the slot, for instance to access the frames slot of a sound called my-sound:

(ats-sound-frames my-sound)  
Also spectral data can be dereferenced using Lisp's aref function:
(aref (ats-sound-frq my-sound) 0)

(aref (aref (ats-sound-frq my-sound) 0) 3)
in the first case we access the frequency vector of partial 0 (it is equivalent to do: (get-par-frq my-sound 0)), in the second case we access the frequency value of the partial 0 at frame 3 (it is equivalent to do: (get-frq my-sound 0 3)).

The following list describes the ATS-SOUND structure slots:

  • name: string with the sound's name.
  • sampling-rate: sound's sampling rate [samples/sec] (read from the analyzed soundfile header)
  • frame-size: analysis frame size [samples] (distance between analysis windows)
  • window-size: analysis window size [samples]
  • partials: number of partials
  • frames: number of frames
  • bands: vector with critical band numbers present in the residual's complementary model. Bands present in this vector sould be re-synthesized if a full-bandwidth residual is wanted (see the band-energy slot below).
  • ampmax: maximum linear amplitude
  • frqmax: maximum frequency
  • frq-av: vector with partials' frequency centroids.
  • amp-av: vector with partials' mean amplitude.
  • dur: sound's duration in seconds.
  • time: array of vectors with time data indexed by partial (the array is of partials size and each internal vector is frames values long).
  • frq: array of vectors with frequency data indexed by partial (the array is of partials size and each internal vector is frames values long).
  • amp: array of vectors with amplitude data indexed by partial (the array is of partials size and each internal vector is frames values long).
  • pha: array of vectors with phase data indexed by partial (the array is of partials size and each internal vector is frames values long).
    NOTE: this slot might be NIL in case phase information was removed from the sound (see the Saving and Loading Sounds section below).
  • energy: array of vectors with noise energy data indexed by partial (the array is of partials size and each internal vector is frames values long).
    NOTE: this slot might be NIL in case noise information was not required during analysis or removed from the sound (see the Saving and Loading Sounds section below).
  • band-energy: array of vectors with critical-band noise energy data indexed by band (the array is of (length (ats-sound-bands sound)) size and each internal vector is frames values long). Energy data stored in these vectors correspond to band numbers indicated in the bands slot. Critical-band frequency edges are stored in the system global parameter *ats-critical-band-edges* (see the System Parameters section below). NOTE: this slot might be NIL in case noise information was not required during analysis or removed from the sound (see the Saving and Loading Sounds section below).

System Parameters

ATS has global parameters that the user can access and in some cases customize. The following is a list of the most important parameters of the system:

  • *ats-dir*: ATS's main directory. See the README file coming with the sources for more details.
  • *ats-src-dir*: ATS's sources directory. See the README file coming with the sources for more details.
  • *ats-bin-dir*: ATS's binaries directory. See the README file coming with the sources for more details.
  • *ats-synth-dir*: directory with CLM instruments for synthseis. See the README file coming with the sources for more details.
  • *ats-snd-dir*: ATS's soundfile directory. See the README file coming with the sources for more details.
  • *ats-sounds*: list of strings with the names of sounds loaded in the system. Every time a new sound is created or loaded from disk its name gets pushed into this list.
  • *ats-critical-band-edges*: list of critical band frequency edges as published in "Psychoacoustics, Facts and Models" by E. Zwicker and H. Fastl. An extra band was added to cover the 0 to 20000 Hz frequency range, so 25 bands are present in this model corresponding to an extended Bark scale. To convert from frequency in Hertz to Barks you can use the frq-to-bark function:
    (frq-to-bark 1000.0)
    -> 9.520021
    
    or to find which band a particular frequency falls in you can use the find-band function:
    (find-band 1000.0)
    -> 8
    
    Note that the Bark scale is defined from 1 to 25, but ATS band numbers go from 0 to 24, that is why 1000Hz falls into band 8 and not 9. To get the edges and center frequency of an ATS band you can use the band-edges and band-center macros:
    (band-edges 8)  
    -> (920.0 1080.0)
    (band-center 8)
    -> 1000.0
    
    The macro band-partials (band-partials band sound frame) returns a list with the numbers of partials present in a particular band at a particular frame
    (band-partials 8 my-sound 40)
    ->(10 11 12)
    

Saving and Loading Sounds

ATS sounds can be saved to and loaded from disk. Data is written to disk as double floats, the binary file has the following format:
ATS-HEADER
ATS-FRAME-#0
...
ATS-FRAME-#N-1
for N frames of data. An ATS-HEADER contains the following data:

  • *ats-magic-number*
  • sampling-rate (samples/sec)
  • frame-size (samples)
  • window-size (samples)
  • partials (number of partials)
  • frames (number of frames)
  • ampmax (max. amplitude)
  • frqmax (max. frequecny)
  • dur (duration in sec.)
  • type (frame type, see below)

The global parameter *ats-magic-number* has a default value of 123.0, it is used for data sanity test only (byte endianess). An ATS-FRAME can be of the following four types:

  1. No phase or noise information present:
    time (frame starting time)
    amp (par#0 amplitude)
    frq (par#0 frequency)
    ...
    amp (par#N-1 amplitude)
    frq (par#N-1 frequency)
    
    for N partials.

  2. With phase information but no noise:
    time (frame starting time)
    amp (par#0 amplitude)
    frq (par#0 frequency)
    pha (par#0 phase)
    ...
    amp (par#N-1 amplitude)
    frq (par#N-1 frequency)
    pha (par#N-1 phase)
    
    for N partials.

  3. With noise information but no phase:
    time (frame starting time)
    amp (par#0 amplitude)
    frq (par#0 frequency)
    ...
    amp (par#N-1 amplitude)
    frq (par#N-1 frequency)
    
    energy (band#0 energy)
    ...
    energy (band#24 energy)
    
    for N partials and 25 critical bands.

  4. Both phase and noise information present:
    time (frame starting time)
    amp (par#0 amplitude)
    frq (par#0 frequency)
    pha (par#0 phase)
    ...
    amp (par#n amplitude)
    frq (par#n frequency)
    pha (par#n phase)
    
    noise (band#0 energy)
    ...
    noise (band#n energy)
    
    for N partials and 25 critical bands.

The ats-save function saves an ATS-SOUND to disk:

ats-save sound file &key (save-phase T)(save-noise T)

Example:

;;; saving sound with both phase and noise information
(ats-save my-sound "/tmp/my-sound.ats")

;;; saving sound with no phase information
(ats-save my-sound "/tmp/my-sound-no-pha.ats" :save-phase NIL)

The ats-load function loads a sound from disk into the system:

ats-load file sound &key (dist-energy T)

Example:

;;; loading a file from disk
(ats-load "/tmp/my-sound.ats" 'my-new-sound)

*Synthesis

Sound Synthesis in ATS is implemented using CLM. Two re-synthesis instruments come with the system, but as API macros and ATS-SOUND structure slots are accessible inside CLM's run loop any other synthesis algorithms can be easily design (see the code in sin-synth.ins and sin-noi-synth.ins for design example).


* Download ATS

ATS-1.0 sources can be dowloaded by anonymous ftp from CCRMA at:
ftp://ccrma-ftp.stanford.edu/pub/Lisp/ATS/ATS-1.0.tar.gz

(Temporary sources for CMUCL: ftp://ccrma-ftp.stanford.edu/pub/Lisp/ATS/ATS-1.0-CMUCL.tar.gz)

Read the README file coming with the distribution for installation details.


*The ATS Project

For the most current information about ATS please visit our project site at:
http://www.dxarts.washington.edu/ats/