Next  |  Prev  |  Up  |  Top  |  REALSIMPLE Top


Excitation

Figure 21: Time-domain plot of a recorded guitar tone on the open high 'e' string.
Image guitar_tone

Figure 22: Spectrogram of a recorded guitar tone on the open high 'e' string.
Image guitar_tone_fft

Figure 23: Spectrogram of a recorded guitar tone on the open high 'e' string from time $ t=0$ s to time $ t=0.5$ s
Image guitar_tone_fft_closeup

Figure 24: Time-domain plot of the same recorded guitar tone with harmonic peaks removed resulting in an excitation signal for physical model use.
Image excitation

Figure 25: Spectrogram of the same recorded guitar tone with harmonic peaks removed resulting in an excitation signal for physical model use.
Image excitation_fft

Figure 26: Spectrogram of the same recorded guitar tone with harmonic peaks removed resulting in an excitation signal for physical model use from time $ t=0$ s to time $ t=0.5$ s.
Image excitation_fft_closeup

As discussed in Section 2.2, the excitation to our virtual instrument is a signal, taken from a wave-table used to drive our digital waveguide models. In the literature there are several methods for computing excitation signals from actual recordings. Extracting excitations from actual recordings of guitar plucks result in the best, psychoacoustically sounding models [33].

In recent years, two fundamentally different approaches for obtaining excitation signals from recordings have emerged: inverse-filtering and constant overlap-add (COLA) methods. We briefly describe the problem and give an overview of the various methods.

Figures 21 and 22 show a recorded guitar note on the guitar's high 'e' string and its Short-Time-Fourier-Transform (STFT), respectively. As shown, upon the onset of the note, all frequencies have energy. After the inital attack, most of the energy remains at the harmonic frequencies of the fundamental. The goal of excitation extraction is to remove the tonal components that ring after the initial onset and to reduce the energy during the onset at those components to match the general energy levels at other frequencies.

With the problem now graphically represented, we describe how the two different methods approach removal of harmonic peaks shown in the STFT. The inverse-filtering methods remove the harmonic peaks by inverse-filtering the signal with a comb-filter with peaks at the harmonic frequencies. The COLA methods remove the harmonic peaks by applying non-linear averaging to the magnitudes at each STFT frame of the signal.

The first method is the Matrix-Pencil Inverse-Filtering method [34]. It computes the sinusoidal components of a signal, using the Matrix-Pencil method, and performs inverse-filtering with the sinusoidal components to remove the tonal components leaving the excitation [35].

The second method, the Sines-Plus-Noise Inverse-Filtering method is similar to the Matrix-Pencil method, in that sinusoidal components of a model are computed, but instead of using the Matrix-Pencil method for computing the sinusoidal components, a generative sinusoidal model is used [36,37]. A residual signal is computed from subtracting from the original recording the sinusoidal signal. Inverse-filtering is then performed on the recording using a Digital Waveguide tuned for the recording where scaled versions of the residual and sinusoidal signals are added together to help remedy the notches created from inverse-filtering [38].

The third method, the Magnitude Spectrum Smoothing, uses STFT processing. Within each FFT window, a low-pass filter is applied to the magnitude spectrum of the window. The iFFT is then taken where the resulting time-signal is stored in a buffer for overlap-add [39].

The last method, the Statistical Spectral Interpolation method, similar to the MSS method, performs STFT processing and COLA reconstruction, but removes harmonic peaks by sampling new spectral magnitudes at the peaks according to a normal distribution with mean and covariance equal to those of magnitudes in the values surrounding the peak for each FFT frame [33].

As the literature shows, SSI produces the best psycho-acoustically sounding excitations. Here, we present the method in detail.

Figure 27: Plot of the original recorded guitar tone's first FFT frame with focus on the peak near $ 660$ Hz. Circle dots are FFT values to be used for statistics collecting.
Image peak_removal_pre

Figure 28: Plot of the original recorded guitar tone's first FFT frame with focus on the removed peak near $ 660$ Hz. Circle dots are FFT values to be used for statistics collecting.
Image peak_removal_post

From a high-level viewpoint, the SSI method only modifies the magnitudes of the STFT of the guitar tone without affecting phase information. The method collects statistics on the magnitudes of frequencies surrounding harmonic peaks and uses these statistics to generate non-deterministic gain-changes for the magnitudes at these peaks, without modifying the phase. With inverse-filtering methods, modifying phase inevitably introduces artifacts. Thus, this method's primary goal is to minimally-alter the original tone.

The STFT is used for analyzing and modifying the original recorded tone. The STFT can be seen as a sliding window that takes at each sample-window a FFT of the windowed signal. The transform of that windowed portion is then modified, and the iFFT is then taken and saved in a buffer. The window is then slid according to how much overlap is wanted. The parameters for the STFT are the type of window used, the length of the window and the number of samples the window slides by. Sample parameters for the method are a Hamming window of length $ 2^{12}$ samples with $ 0.9$ overlap (hop size of $ 410$ samples). Though $ 2^{12}$ samples with a sampling-rate of $ 44,100$ Hz is long ( $ \approx 100$ ms), pre-echo-distortion artifacts are remedied by starting the algorithm during the onset of the recorded tone.

Actual processing occurs at each window of the STFT. Consider each FFT window taken to be a frame for processing. Within each frame, the harmonic peaks of the recorded tone are attenuated. Harmonic peaks can be found using the Quadratically Interpolated FFT (QIFFT) method [40].

Assuming that the fundamental frequency of the recorded tone is at $ f_1$ in Hz. A bandwidth $ W_p$ in Hz is specified, indicating the width of the peak. Another bandwidth $ W_n$ in Hz is specified indicating the width of the interval used for statistics collecting with respect to the fundamental frequency $ f_1$ . In using the SSI method for the recording in Figure 21, $ W_p = 0.3\cdot f_1$ and $ W_n = 0.75\cdot f_1$ . These values ensure that the points used for statistics collecting do not reach into the next harmonic peak but are large enough to obtain a reasonable mean and standard deviation.

For each harmonic $ i$ with frequency $ f_i$ , the following is defined and used for processing.

Define a set of indices, $ \Gamma$ , whose frequency values satisfy the following:

$\displaystyle \forall \gamma \in \Gamma, W_p \leq \vert\nu_{\gamma} - f_i\vert \leq W_n$ (4)

where $ \nu_k$ corresponds to the frequency in Hz of the $ k$ th FFT bin. The values in $ \Gamma$ correspond to indices within the current frame whose frequencies lie within the specified band $ W_n$ but outside the band $ W_p$ centered around $ f_i$ . See the circled points in Figures 27 and 28.

The mean and standard deviation of the magnitude of values in FFT bins in $ \Gamma$ are computed as follows:

$\displaystyle \mu = \frac{1}{\vert\Gamma\vert}\sum_{i \in \Gamma} \vert X_i\vert$ (5)

$\displaystyle \sigma = \sqrt{\frac{1}{\vert\Gamma\vert}\sum_{i \in \Gamma} {(\vert X_i\vert-\mu)}^2}$ (6)

Define the set of indices, $ \Delta$ , whose frequency values satisfy the following:

$\displaystyle \forall \delta \in \Delta, \vert\nu_{\delta} - f_i\vert \leq W_p$ (7)

The values in $ \Delta$ correspond to indices within the current frame whose frequencies lie within the specified band $ W_p$ centered around $ f_i$ . The magnitudes at these frequencies are changed to remove the peak. See the starred points in Figures 27 and 28.

Thus, for all bins with indices in $ \Delta$ , magnitude values are modified to remove the observed peaks. This occurs as follows:

For each $ \delta \in \Delta $ , generate a value $ \rho\sim \mathcal{N}(\mu,\sigma)$ .

$\displaystyle X_{\delta} := \frac{\rho}{\vert X_{\delta}\vert} X_{\delta}.$ (8)

Figures 27 and 28 shows the points the algorithm uses for statistics collecting and the points with gains altered. As shown, the peak at $ 660$ Hz is entirely removed.


Next  |  Prev  |  Up  |  Top  |  REALSIMPLE Top

Download phys_mod_overview.pdf

``Virtual Stringed Instruments'', by Nelson Lee and Julius O. Smith III,
REALSIMPLE Project — work supported by the Wallenberg Global Learning Network .
Released 2008-02-20 under the Creative Commons License (Attribution 2.5), by Nelson Lee and Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA