Time Scale Modification (TSM) means speeding up or slowing down a sound without affecting the frequency content, such as the perceived pitch of any tonal components. For example, TSM of speech should sound like the speaker is talking at a slower or faster pace, without distortion of the spoken vowels. Similarly, TSM of music should change timing but not tuning.
When a recorded speech signal is simply played faster, such as by lowering its sampling-rate and playing it at the original sampling-rate, the pace of the speech increases as desired, but so does the fundamental frequency (pitch contour). Moreover, the apparent ``head size'' of the speaker shrinks (the so-called ``munchkinization'' effect). This happens because, as illustrated in §10.3, speech spectra have formants (resonant peaks) which should not be moved when the speech rate is varied. The average formant spacing in frequency is a measure of the length of the vocal tract; hence, when speech is simply played faster, the average formant spacing decreases, corresponding to a smaller head size. This illusion of size modulation can be a useful effect in itself, such as for scaling the apparent size of virtual musical instruments using commuted synthesis [47,265]. However, we also need to be able to adjust time scales without this overall scaling effect.
The Fourier dual of time-scale modification is frequency scaling. In this case, we wish to scale the spectral content of a signal up or down without altering the timing of sonic events in the time domain. This effect is used, for example, to retune ``bad notes'' in a recording studio. Frequency scaling can be implemented as TSM preceded or followed by sampling-rate conversion, or it can be implemented directly in a sequence of STFT frames like TSM.