A T S

Analysis - Transformation - Synthesis

Juan C. Pampin

Stanford University
Center for Computer Research
in Music and Acoustics (CCRMA)
`juan@ccrma.stanford.edu`

University of Washington
Center for Digital Arts
and Experimental Media
`pampin@u.washington.edu`

ATS is a software library of functions for spectral Analysis, Transformation, and Synthesis of sound based on a sinusoidal plus critical-band noise model. A sound in ATS is a symbolic object representing a spectral model that can be sculpted using a variety of transformation functions. Spectral data can be accessed trough an API, and saved to/loaded from disk. ATS is written in LISP, its analysis and synthesis algorithms are implemented using the CLM (Common Lisp Music) synthesis and sound processing language.

Analysis

General Block Diagram

Peak Detection

Peak Tracking

Psychoacoustic Processing

Post-processing

Residual

Residual Analysis

Tracker Documentation

Examples

Trasnformation

Transformation Functions

trans-sound
stretch-sound
shift-sound

API

get-par-amp
get-par-frq
get-par-time
get-par-energy
get-par
get-amp
get-frq
get-time
get-energy
get-band-energy
get-amp-f
get-frq-f
get-time-f
get-energy-f
get-band-energy-f
get-value
get-amp-t
get-frq-t
get-energy-t
get-band-energy-t

Structure Slots

System Parameters

Saving and Loading Sounds

ats-save
ats-load

Analysis

Sound analysis in ATS is performed using the tracker function. Partials are tracked using high-resolution sinusoidal analysis (see below). Tracked partials are then synthesized using phase information, and subtracted from the original sound to obtain a residual signal containing what couldn't be modeled by the sinusoidal analysis. The resulting residual is modeled as time-varying critical-band noise. This is performed by dividing the spectrum of the residual into 25 critical bands and computing the energy in each band at frame rate. Critical-band energy is then re-injected to partials present in those spectral regions as modulated narrow-band noise. A complementary model is used to keep noise energy from band regions where no partials were tracked.

General Block Diagram

The tracker algorithm implements high-resolution sinusoidal analysis suitable for both harmonic and non-harmonic sounds. The technique used for tracking partials is similar to the deterministic analysis of Xavier Serra's SMS (Spectral Modeling Synthesis). One difference with SMS's deterministic analysis is that tracker uses also psychoacoustic information to determine the salience of detected peaks. This information (measured as signal-to-mask ratio, or SMR) is derived from masking effects produced within critical bands, and accounts for the audibility of sinusoidal trajectories. To achieve coherent sinusoidal trajectories, both SMR and frequency deviation information are used to track partials across frames .

Tracker consists of five main modules. The windowing module breaks the analyzed sound into sort-time overlapping segments and applies an analysis window to the signal. Windows from the Blackman-Harris family are normally used but any other window type implemented by CLM can be used (see the documentation section below). The hop size (number of samples to skip by the analysis window) is expressed as a proportion of the window size (0.25 means 1/4 of the window). The size of the analysis window is calculated as a function of the number of cycles of the lowest frequency to be tracked by the system (usually the fundamental in the case of harmonic sounds). The size of the Fast Fourier Transform (FFT), used to compute the Short Time Fourier Transform (STFT) in the analysis, is internally calculated as the closest power of two greater than two windows, assuring enough zero padding. Both the window size (M) and the FFT size (N) can be forced to be any number of samples (the condition: M<=N has to be achieved, and N must be a power of 2).

Peak Detection

The spectrum issued from the Short Time Fourier Transform (STFT) of the windowed signal is converted to polar form to obtain the magnitude and phase of each bin. After this, peaks are detected along the dB magnitude spectrum. A peak is a local maximum in the spectrum defined as: |Xk-1| < |Xk| > |Xk+1| where Xk is the peak location and Xk-1, Xk+1 its surrounding bins. Once a peak is detected, its real amplitude, frequency, and phase values are obtained by means of parabolic interpolation. Only peaks with magnitude above an indicated threshold are kept in the analysis. For more details on these peak detection and interpolation techniques you can visit Julius Smith's web page on PARSHL.

Peak Tracking

Peaks detected in one analysis frame have to be integrated to sinusoidal trajectories. This is done in three steps: first, candidates to continue a particular trajectory are found in the peak pool of the new frame; second, the best candidate is found based on masking and frequency information. Trajectory's frequency and SMR are averaged across frames and these values, called tracks, are used to evaluate which peak of the pool better continues the trajectory. An adjustable number of frames is used to average track values, the latest peak incorporated to the trajectory can also be used (in a weighted fashion) to compute track parameters (this can be useful for tracking unstable sinusoids, see the documentation below). The best peak candidate will be the one with minimal SMR difference and frequency deviation from the track (the intervention of masking information in this process can be also weighted, see the documentation below). Taking two parameters into account, SMR and frequency deviation, practically eliminates conflicts between tracks (i.e. having more than one track claiming for the same peak). Finally, tracked peaks are incorporated to their sinusoidal trajectories, trajectories that didn't incorporate peaks in this frame are "turned off" (tracks keep their last values and wait for candidate peaks in subsequent frames), and peaks left over in the pool "start up" new trajectories and tracks.

Psychoacoustic Processing

Signal to Mask Ratio (SMR) information used during tracking is computed by the Psychoacoustic Processing module. The magnitude of the peaks present in a frame are used to evaluate a masking threshold across frequency. All peak frequencies are converted to Bark scale and linear masking curves traced up and down in frequency from each peak location (note the asymmetry between the left and right slope of the lines on the picture, the right slope is inversely proportional to the magnitude of the peak). After masking curves for all peaks are traced, they are combined to create a masking threshold across the 25 critical bands (from 1 to 25 Barks). Then SMR for each peak is computed as the ratio in dB between the peak's magnitude and the level of the masking threshold at the peak's location.

Post-processing

After partials were tracked, an ATS-SOUND structure is created to store the spectral data. In a post-processing stage, short trajectories with low SMR average value are removed and gaps in continuous trajectories fixed. Also in this step, frequency centroid and average SMR for each partial are computed and stored in the structure.

Residual

Once sinusoidal trajectories are fixed and stored, the residual is computed. This step is performed by re-synthesizing the tracked partials using phase information, and subtracting them from the original sound in the time domain. The resulting signal is called residual and contains what was left out by the tracking process, usually noise. A two-channel file with the re-synthesis and the residual is generated by the system, and can be used as an intuitive measure of the sinusoidal tracking quality. Normally, a noisy and low-energy residual is sign of successful tracking (see the documentation below).

Residual Analysis

After being computed, the residual is analyzed at frame rate using a sliding rectangular window and the STFT. The fequency spectrum of the residual is transformed to Bark scale and energy computed at each of the 25 critical bands. The residual's energy is then re-injected as modulated narrow-bandwidth noise to partials present at each sub-band of the spectrum. Band regions with significant energy where no partials were tracked are kept in a complementary model. Modulated critical-bandwidth noise is used to model residual energy in those remaining sub-bands regions.

Tracker Documentation

tracker file snd &key (start 0.0)(duration nil)(lowest-frequency 20)(highest-frequency 20000.0)(frequency-deviation 0.1)(window-cycles 4)(force-M NIL)(window-type 'blackman-harris-4-1)(force-window NIL)(hop-size 1/4)(fft-size nil)(lowest-magnitude (db-amp -60))(track-length 3)(last-peak-contribution 0.0)(SMR-continuity 1.0)(amp-threshold nil)(min-segment-length 3)(residual "ats-residual.snd")(verbose nil)

file: string with the name and path of the soundfile to analyze. Should be a mono file (in case a multichannel file is passed, tracker will automatically analyze its first channel). All soundfile types supported by CLM can be read.
snd: name of the sound structure to be used in ATS to store the analysis data. This is a Lisp symbol that will point to the ATS-SOUND structure generated by tracker.
start: time offset in the file where to start the analysis, defaults to 0.0 seconds.
duration: duration of the analysis in seconds. Defaults to NIL, what means that analysis will be perform until the end of the soundfile is reached.
lowest-frequency: lowest frequency to track. Peak detection will be performed from this frequency up. This value is also used to compute the size of the analysis window (see window-cycles below), it defaults to 20 Hz.
highest-frequency: highest frequency to track: peak detection will be performed up to this frequency value. Defaults to 20KHz.
frequency-deviation: maximum deviation allowed around the frequency of a track. This value is expressed as a proportion of the track's frequency. For instance, a value of 0.1 means that only peaks with frequency values within 10% off a particular track's frequency will be considered candidates for continuation. The smaller this value the more stable trajectories will be tracked, it defaults to 0.1 (10% of the frequency of the track). Note that this parameter works in cojunction with SMR-continuity (see below)
window-cycles: number of cycles of the lowest-frequency to be present in one analysis window. Defaults to 4 cycles. The analysis window size (M) is computed as:
M = Fs*window-cycles/lowest-frequency [samples] (Fs: sampling rate read from the header of the analysis file).
force-M: this parameter overrides window-cycles, it should be only used in case a particual number of samples is wanted for the window size. Defaults to NIL.
window-type: analysis window type. ATS has built-in Balckman-Harris windows of 6 types:
1. exact-blackman
2. blackman
3. blackman-harris-3-1
4. blackman-harris-3-2
5. blackman-harris-4-1
6. blackman-harris-4-2
Each type presents a different compromise between main-lobe width and side-lobe rejection. Type blackman-harris-4-1 (-92 dB side-lobe rejection) is used by default.
force-window: this parameter overrides window-type, any type of window implemented by CLM can be used here. Defaults to NIL (i.e. window-type is used).
hop-size: proportion of the analysis window length (M) to be advanced in time. Default value is 1/4. The analysis frame period in seconds can be computed as:
M*hop-size/Fs
fft-size: forces the FFT size. Defaults to NIL, what means that the FFT size (N) will be calculated as the first power of 2 greater than twice the window size (zero-padding is applied). This parameter should be used only when a particular combination of M and N is wanted (see also force-M).
lowest-magnitude: lowest magnitude of a peak to be detected, only bins with magnitudes above this value will be considered by the peak detection algorithm. Defaults -60 dB (0.001)
track-length: length of the analysis tracks in frames. This value affects the "memory" of the peak tracking algorithm (i.e. its ability to predict what a stable trajectory is). For stable sounds you can increase this value (it defaults to 3 frames), for unstable sounds you can make it smaller (with a value of 1 it will track peaks on a frame by frame basis). The parameter last-peak-contribution also affects the way peaks are tracked (see below).
last-peak-contribution: this parameter controls the weight the last peak incorporated to a trajectory has in the peak tracking process. With the default value 0.0 peak tracking is totally determined by averaged frequency and SMR values called tracks (see track-length above). With a value of 1.0 tracking decisions are made using the last-peak incorporated to a trajectory (tracks are not used). Values in between can be used to account for both stability and sudden trajectory changes.
SMR-continuity: this parameter controls the weight Signal to Mask Ratio (SMR) information has in the peak tracking process. With the default value 1.0 SMR has equal weight as frequency deviation information, with a value of 0.0 only frequency information is used for tracking peaks. Values in between adjust the amount of SMR continuation wanted in the analysis.
amp-threshold: this parameter is used during ATS-SOUND post-processing. Only trajectories with an average amplitude over the amp-threshold are kept as sound partials. The default value is NIL, what means that all trajectories will be kept.
min-segment-length: this parameter is used during ATS-SOUND post-processing. Segments shorter than min-segment-length frames are removed from the trajectories. This helps to avoid keeping intermittent short segments in a sound.
residual: name of the residual file ATS will generate after analysis. This is a two-channel file, the first channel contains the time-domain residual of the analysis, and the second one the re-synthesis of it (with phase). The default soundfile name is "ats-residual.snd", and will be written in the current directory, a string with a more convenient soundfile name an a path can be used. If NIL is passed, then no residual analysis is performed.
verbose: print tracking information during analysis. Defaults to NIL.

Examples

;;; clarinet analysis
(tracker (concatenate 'string *ats-snd-dir* "clarinet.aif")
	 'cl
	 :start 0.0
	 :hop-size 1/4
	 :lowest-frequency 100.0
	 :highest-frequency 20000.0
	 :frequency-deviation 0.05
	 :lowest-magnitude (db-amp -70)
	 :SMR-continuity 0.7
	 :track-length 6
	 :min-segment-length 3
	 :residual "/tmp/cl-res.snd"
	 :verbose nil)

;;; crotale analysis
(tracker (concatenate 'string *ats-snd-dir* "crt-cs6.snd") 
	 'crt-cs6
	 :start 0.1
	 :lowest-frequency 500.0
	 :highest-frequency 20000.0
	 :frequency-deviation 0.15
	 :window-cycles 4
	 :window-type 'blackman-harris-4-1
	 :hop-size 1/8
	 :lowest-magnitude (db-amp -90)
	 :amp-threshold -80
	 :track-length 6
	 :min-segment-length 3
	 :last-peak-contribution 0.5
	 :SMR-continuity 0.3
	 :residual "/tmp/crt-cs6-res.snd"
	 :verbose nil)

Transformation

A sound in ATS is an intermediate representation of the spectral evolution of an analyzed signal. Users can manipulate the parameters of a sound to operate spectral transformations. Transformations can be destructive i.e. the original sound structure is changed, or generative i.e. the transformation is applied to a new instance of the sound, keeping the original untouched. On ATS-SOUND can cumulate several transformations before being re-synthesized.

Parameters passed to the transformation functions can be, most of the times, of any of this forms:

A number: transformation is parallel and synchronous , this numeric value is used to transform all partials over all the time frames of the sound.
A list of numbers: each partial is transformed with a different value. When the transformation is applied to the time data the sound structure is transformed diachronically.
A list containing a CLM style envelope: transformation is dynamic.
A list of envelopes: each partial is transformed with a different dynamic value. When the transformation is applied to the time data the sound structure is transformed diachronically.
A list of numbers and envelopes.

Transformation Functions

trans-sound sound transposition &key formants name simp

Transpose the frequencies of the partials of a sound.

sound: sound structure to transform.
transposition: transposition factor (being of any of the formats described above)
formants: if T (true), formants of the sound are kept after transposition. The amplitude of the partials are scaled according to the spectral envelope of the original sound.
name: optional name for the new generated sound (if NIL the function is destructive)
simp: if T (true) partials with a mean frequency over half sampling rate (taken from ats-sound-sampling-rate) are eliminated from the sound structure.

Example:

;;; Note: 
;;; (ats-sound-partials my-sound) returns the number 
;;; of partials of the sound structure my-sound. Here we
;;; are transposing the even and odd partials using different 
;;; envelopes. The loop macro is creating the list of 
;;; envelopes transp-env that we use in the call to the function.

(let ((transp-env (loop for i from 0 below (ats-sound-partials my-sound)
                        with even-env = '(0 1.0 1 2.0)
                        with odd-env = '(0 1.0 1 0.5)
                     collect (if (oddp i) odd-env even-env))))
  (trans-sound 'my-sound transp-env :formants T 
                                    :name 'my-new-sound 
                                    :simp T))

stretch-sound sound stretch &key name

Performs time stretching over the partials of a sound.

sound: sound structure to transform.
stretch: time stretching factor (being of any of the formats described above). In ATS each partial can be stretched by a different constant or dynamic factor. This would produce a spectral structure where vertical relationships between partials are completely altered (diachronic transformation). During synthesis, parameters are interpolated between windows according to this (altered) time information.
name: optional name for the new generated sound (if NIL the function is destructive)

Example:

;;; Note: 
;;; stretch-env is a list with stretch factors going from 1.0
;;; for the first partial up to 8.2 for the last partial. After 
;;; stretching, higher partials are longer than lower partials.
;;; As we apply stretch-sound to my-new-sound, the
;;; transformation will be cumulative, being the output sound 
;;; structure my-new-sound-1 the result of stretching the
;;; previously transposed version of the original my-sound.

(let* ((par (ats-sound-partials my-new-sound))
       (stretch-env (loop for i from 0 below par
                          for j from 1 by (/ 8.0 par)
                       collect j)))
  (stretch-sound 'my-new-sound stretch-env :name 'my-new-sound-1))

shift-sound sound shift &key formants name simp

Operates frequency shifting over the frequencies of the partials of a sound.

sound: sound structure to transform.
shift: shifting factor (being of any of the formats described above). Numeric values can be positive or negative.
formants: if T (true), formants of the sound are kept after the shift operation. The amplitude of the partials are scaled according to the spectral envelope of the original sound.
name: optional name for the new generated sound (if NIL the function is destructive)
simp: if T (true) partials with a mean frequency over half sampling rate (by default taken from the CLM sampling-rate variable) are eliminated from the sound structure.

This picture shows the shifted voice spectra generated with ATS for Jonathan Harvey's piece Ashes Dance Back, for choir and electronic sounds.

Example:

;;; Note: 
;;; (ats-sound-frq-av my-sound) returns a vector 
;;; containing frequency centroids of the partials of my-sound. 
;;; Here we are shifting the even partials 
;;; up by 1/8 of their frequency  and the odd partials down by 
;;; 1/8 of their frequency centroid. The loop macro is creating 
;;; the list of shift values shift-env that we use in the 
;;; call to the function.

(let ((shift-env (loop for i from 0 below (ats-sound-partials my-sound)
                       with frq = (aref (ats-sound-frq-av my-sound) i) 
                    collect (if (oddp i)(* 1/8 frq) (* -1/8 frq)))))
  (shift-sound 'my-sound shift-env :formants T 
                                   :name 'my-new-sound-2
                                   :simp T))

API

Transformation functions are built using ATS's API. The API functions and macros make spectral data access easy for users to develop their own transformantion algorithms.

get-par-amp sound partial
Returns a vector with amplitude data for partial
get-par-frq sound partial
Returns a vector with frequency data for partial
get-par-time sound partial
Returns a vector with time data for partial
get-par-energy sound partial
Returns a vector with noise energy data for partial
get-par sound partial parameter
A more general interface wrapping the previous set of macros. Returns a vector with parameter (amp, frq, time, energy) data for partial
get-amp sound partial frame
Returns amplitude value for partial at frame
get-frq sound partial frame
Returns frequency value for partial at frame
get-time sound partial frame
Returns time value for partial at frame
get-energy sound partial frame
Returns noise energy value for partial at frame
get-band-energy sound band frame
Returns noise energy value for band at frame
get-amp-f sound partial frame
Returns amplitude value for partial at a fractional frame using linear interpolation.
get-frq-f sound partial frame
Returns frequency value for partial at a fractional frame using linear interpolation.
get-time-f sound partial frame
Returns time value for partial at a fractional frame using linear interpolation.
get-energy-f sound partial frame
Returns noise energy value for partial at a fractional frame using linear interpolation.
get-band-energy-f sound partial frame
Returns noise energy value for band at a fractional frame using linear interpolation.
get-value sound [partial:band] frame parameter
A more general interface wrapping the previous set of macros. Returns parameter (amp, frq, time, energy, or band energy) for partial or band at fractional frame.

The following set of functions should be used to access data from sounds with non-linear time structure alterations (as the ones performed by stretch-sound using envelopes as stretch factor). This functions are less efficient than the previous ones because data has to be interpolated using time information instead of frame locations.

get-amp-t sound partial time
Returns amplitude value for partial at time using linear interpolation.
get-frq-t sound partial time
Returns frequency value for partial at time using linear interpolation.
get-energy-t sound partial time
Returns noise energy value for partial at time using linear interpolation.
get-band-energy-t sound partial time
Returns noise energy value for band at time using linear interpolation.

Structure Slots

The slots of an ATS-SOUND are accessible both in Lisp and in CLM's run-loop (see the Synthesis section below). Analysis information (such as number of frames and partials) is stored in the structure together with spectral data. To access slot values, Lisp accessor functions should be used. Accessor function names have the ats-sound- prefix followed by the name of the slot, for instance to access the frames slot of a sound called my-sound:

(ats-sound-frames my-sound)

Also spectral data can be dereferenced using Lisp's aref function:

(aref (ats-sound-frq my-sound) 0)

(aref (aref (ats-sound-frq my-sound) 0) 3)

in the first case we access the frequency vector of partial 0 (it is equivalent to do: (get-par-frq my-sound 0)), in the second case we access the frequency value of the partial 0 at frame 3 (it is equivalent to do: (get-frq my-sound 0 3)).

The following list describes the ATS-SOUND structure slots:

name: string with the sound's name.
sampling-rate: sound's sampling rate [samples/sec] (read from the analyzed soundfile header)
frame-size: analysis frame size [samples] (distance between analysis windows)
window-size: analysis window size [samples]
partials: number of partials
frames: number of frames
bands: vector with critical band numbers present in the residual's complementary model. Bands present in this vector sould be re-synthesized if a full-bandwidth residual is wanted (see the band-energy slot below).
ampmax: maximum linear amplitude
frqmax: maximum frequency
frq-av: vector with partials' frequency centroids.
amp-av: vector with partials' mean amplitude.
dur: sound's duration in seconds.
time: array of vectors with time data indexed by partial (the array is of partials size and each internal vector is frames values long).
frq: array of vectors with frequency data indexed by partial (the array is of partials size and each internal vector is frames values long).
amp: array of vectors with amplitude data indexed by partial (the array is of partials size and each internal vector is frames values long).
pha: array of vectors with phase data indexed by partial (the array is of partials size and each internal vector is frames values long).
NOTE: this slot might be NIL in case phase information was removed from the sound (see the Saving and Loading Sounds section below).
energy: array of vectors with noise energy data indexed by partial (the array is of partials size and each internal vector is frames values long).
NOTE: this slot might be NIL in case noise information was not required during analysis or removed from the sound (see the Saving and Loading Sounds section below).
band-energy: array of vectors with critical-band noise energy data indexed by band (the array is of (length (ats-sound-bands sound)) size and each internal vector is frames values long). Energy data stored in these vectors correspond to band numbers indicated in the bands slot. Critical-band frequency edges are stored in the system global parameter *ats-critical-band-edges* (see the System Parameters section below). NOTE: this slot might be NIL in case noise information was not required during analysis or removed from the sound (see the Saving and Loading Sounds section below).

System Parameters

ATS has global parameters that the user can access and in some cases customize. The following is a list of the most important parameters of the system:

*ats-dir*: ATS's main directory. See the README file coming with the sources for more details.
*ats-src-dir*: ATS's sources directory. See the README file coming with the sources for more details.
*ats-bin-dir*: ATS's binaries directory. See the README file coming with the sources for more details.
*ats-synth-dir*: directory with CLM instruments for synthseis. See the README file coming with the sources for more details.
*ats-snd-dir*: ATS's soundfile directory. See the README file coming with the sources for more details.
*ats-sounds*: list of strings with the names of sounds loaded in the system. Every time a new sound is created or loaded from disk its name gets pushed into this list.
*ats-critical-band-edges*: list of critical band frequency edges as published in "Psychoacoustics, Facts and Models" by E. Zwicker and H. Fastl. An extra band was added to cover the 0 to 20000 Hz frequency range, so 25 bands are present in this model corresponding to an extended Bark scale. To convert from frequency in Hertz to Barks you can use the frq-to-bark function:
```
(frq-to-bark 1000.0)
-> 9.520021
```
or to find which band a particular frequency falls in you can use the find-band function:
```
(find-band 1000.0)
-> 8
```
Note that the Bark scale is defined from 1 to 25, but ATS band numbers go from 0 to 24, that is why 1000Hz falls into band 8 and not 9. To get the edges and center frequency of an ATS band you can use the band-edges and band-center macros:
```
(band-edges 8)  
-> (920.0 1080.0)
(band-center 8)
-> 1000.0
```
The macro band-partials (band-partials band sound frame) returns a list with the numbers of partials present in a particular band at a particular frame
```
(band-partials 8 my-sound 40)
->(10 11 12)
```

Saving and Loading Sounds

ATS sounds can be saved to and loaded from disk. Data is written to disk as double floats, the binary file has the following format:

ATS-HEADER
ATS-FRAME-#0
...
ATS-FRAME-#N-1

for N frames of data. An ATS-HEADER contains the following data:

*ats-magic-number*
sampling-rate (samples/sec)
frame-size (samples)
window-size (samples)
partials (number of partials)
frames (number of frames)
ampmax (max. amplitude)
frqmax (max. frequecny)
dur (duration in sec.)
type (frame type, see below)

The global parameter *ats-magic-number* has a default value of 123.0, it is used for data sanity test only (byte endianess). An ATS-FRAME can be of the following four types:

No phase or noise information present:

time (frame starting time)
amp (par#0 amplitude)
frq (par#0 frequency)
...
amp (par#N-1 amplitude)
frq (par#N-1 frequency)

for N partials.

With phase information but no noise:

time (frame starting time)
amp (par#0 amplitude)
frq (par#0 frequency)
pha (par#0 phase)
...
amp (par#N-1 amplitude)
frq (par#N-1 frequency)
pha (par#N-1 phase)

for N partials.

With noise information but no phase:

time (frame starting time)
amp (par#0 amplitude)
frq (par#0 frequency)
...
amp (par#N-1 amplitude)
frq (par#N-1 frequency)

energy (band#0 energy)
...
energy (band#24 energy)

for N partials and 25 critical bands.

Both phase and noise information present:

time (frame starting time)
amp (par#0 amplitude)
frq (par#0 frequency)
pha (par#0 phase)
...
amp (par#n amplitude)
frq (par#n frequency)
pha (par#n phase)

noise (band#0 energy)
...
noise (band#n energy)

for N partials and 25 critical bands.

The ats-save function saves an ATS-SOUND to disk:

ats-save sound file &key (save-phase T)(save-noise T)

sound: ATS-SOUND to save, must be loaded in the system.

file: string with the name of the file to save to disk (by convention the .ats extension is used).

save-phase: flag to indicate if phase information should be saved (defaults to T, true).

save-noise: flag to indicate if noise information should be saved (defaults to T, true).

Example:

;;; saving sound with both phase and noise information
(ats-save my-sound "/tmp/my-sound.ats")

;;; saving sound with no phase information
(ats-save my-sound "/tmp/my-sound-no-pha.ats" :save-phase NIL)

The ats-load function loads a sound from disk into the system:

ats-load file sound &key (dist-energy T)

file: string with the name of the file to load from disk.

sound: quoted symbol to point to the new loaded sound.

dist-energy: noise energy (when present) is stored as 25 critical bands (see file format above). By default, when a file is loaded, noise energy is trasfrerred to partials present in each band and a complementary model created (see the Amalysis sections for more details). If you want to keep energy information in critical-band format set this parameter to NIL (this should be used only for experimental purposes).

Example:

;;; loading a file from disk
(ats-load "/tmp/my-sound.ats" 'my-new-sound)

Synthesis

Sound Synthesis in ATS is implemented using CLM. Two re-synthesis instruments come with the system, but as API macros and ATS-SOUND structure slots are accessible inside CLM's run loop any other synthesis algorithms can be easily design (see the code in sin-synth.ins and sin-noi-synth.ins for design example).

sin-synth start-time sound &key (amp-scale 1.0)(amp-env '(0 1 1 1))(frq-scale 1.0)(duration nil)(par nil)

Performs additive synthesis using oscillators (only sinusoidal components are synthesized).

start-time: starting time in the output sound file.
sound: ATS-SOUND to re-synthesize.
amp-scale: amplitude scaler (defaults to 1.0).
amp-env: amplitude envelope (defaults to '(0 1 1 1))
frq-scale: frequency scale, all frequency values get multiplied by this number during synthesis (defaults to 1.0).
duration: duration of the synthesized sound. If this value is NIL (the default) the value of the dur slot of the sound structure is used. If a value is given the time structure of the sound is re-scaled to fit this value.
par: list of partial numbers to synthesize. If this value is NIL (the default) all the partials of the sound are synthesized.

Examples:

;;; synthesize all partials of a clarinet
(with-sound (:play nil :output "/tmp/cl-1.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-synth 0.0 cl))

;;; synthesize only odd partials
(with-sound (:play nil :output "/tmp/cl-2.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-synth 0.0 cl 
             :par (loop for i from 1 by 2 below (ats-sound-partials cl) collect i)))

;;; transpose a semitone up during synthesis
(with-sound (:play nil :output "/tmp/cl-3.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-synth 0.0 cl :frq-scale (expt 2 1/12)))

;;; expand 4 times
(with-sound (:play nil :output "/tmp/cl-4.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-synth 0.0 cl :duration (* (ats-sound-dur cl) 4)))

sin-noi-synth start-time sound &key (amp-scale 1.0)(amp-env '(0 1 1 1))(frq-scale 1.0)(duration nil)(time-ptr nil)(par NIL)(noise-env '(0 1 1 1))(noise-only NIL)(band-noise t)

General Purpose ATS Synthesizer. This instrument sythesizes both sinusoids and noise. The noise part can contain the partials energy only (band-noise NIL), or both the partials energy and the complementary critical-band energy (if they exist). Time information can be handled in two ways: using time information from partials (time-ptr NIL), or using a time-pointer envelope. In time-pointer mode X values of the time-ptr envelope are proportional time in the ATS sound (1.0=ats-sound-dur) and Y values are proportional times in the output sound (1.0=duration).

start-time: starting time in the output sound file.
sound: ATS-SOUND to re-synthesize.
amp-scale: amplitude scaler (defaults to 1.0).
amp-env: amplitude envelope (defaults to '(0 1 1 1))
frq-scale: frequency scale, all frequency values get multiplied by this number during synthesis (defaults to 1.0).
duration: duration of the synthesized sound. If this value is NIL (the default) the value of the dur slot of the sound structure is used. If a value is given the time structure of the sound is re-scaled to fit this value.
time-ptr: time pointer, if nit (the default) sound's time information is used.
noise-envelope: envelope to control the level of the noise component. Defauts to (0 1 1 1).
noise-only: switch for noise-only synthesis (defaults to NIL).
band-noise: switch for band noise synthesis (defaults to T).
par: list of partial numbers to synthesize. If this value is NIL (the default) all the partials of the sound are synthesized.

Examples:

;;; plain resynthesis (sines plus noise) using time pointer
(with-sound (:play nil :output "/tmp/cl-5.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-noi-synth 0.0 cl :time-ptr '(0 0 1 1)))

;;; plain resynthesis (noise only)
(with-sound (:play nil :output "/tmp/cl-6.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-noi-synth 0.0 cl :time-ptr '(0 0 1 1) :noise-only t))

;;; using time pointer to modify the attack
(with-sound (:play nil :output "/tmp/cl-7.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-noi-synth 0.0 cl :time-ptr '(0.0 0.0 0.5 0.1 0.7 0.7 1.0 1.0)))

;;; play backwards and gradually adding noise
(with-sound (:play nil :output "/tmp/cl-8.snd" :srate 44100
		   :statistics t :verbose t)
  (sin-noi-synth 0.0 cl 
		 :time-ptr '(0.0 1.0 0.9 0.3 1.0 0.0)
		 :noise-env '(0.0 0.0 0.9 1.0 1.0 1.0)
		 :amp-env '(0 0 0.1 0 0.9 1 1 1)))

Download ATS

ATS-1.0 sources can be dowloaded by anonymous ftp from CCRMA at:
ftp://ccrma-ftp.stanford.edu/pub/Lisp/ATS/ATS-1.0.tar.gz

(Temporary sources for CMUCL: ftp://ccrma-ftp.stanford.edu/pub/Lisp/ATS/ATS-1.0-CMUCL.tar.gz)

Read the README file coming with the distribution for installation details.

The ATS Project

For the most current information about ATS please visit our project site at: http://www.dxarts.washington.edu/ats/

A T S

Analysis - Transformation - Synthesis

Juan C. Pampin

Stanford University Center for Computer Research in Music and Acoustics (CCRMA) juan@ccrma.stanford.edu

University of Washington Center for Digital Arts and Experimental Media pampin@u.washington.edu

Table of Contents

Analysis

General Block Diagram

Peak Detection

Peak Tracking

Psychoacoustic Processing

Post-processing

Residual

Residual Analysis

Tracker Documentation

Examples

Transformation

Transformation Functions

API

Structure Slots

System Parameters

Saving and Loading Sounds

Synthesis

Download ATS

The ATS Project

Stanford University
Center for Computer Research
in Music and Acoustics (CCRMA)
`juan@ccrma.stanford.edu`

University of Washington
Center for Digital Arts
and Experimental Media
`pampin@u.washington.edu`