Using the Matlab test program in [#!MDFT!#],9.1FFT convolution was found to be faster than direct convolution starting at length (looking only at powers of 2 for the length ).9.2 FFT convolution was also never significantly slower at shorter lengths for which ``calling overhead'' dominates.
Running the same test program in 2011,9.3 FFT convolution using the fft function was found to be faster than conv for all (power-of-2) lengths. The speed of FFT convolution divided by that of direct convolution started out at 14 for , fell to a minimum of at , above which it started to climb as expected, reaching at . Note that this comparison is unfair because the Octave fft function is a dynamically linked, separately compiled module, while conv is written in the matlab language and thus suffers more overhead from the matlab interpreter.
An analysis reported in Strum and Kirk [#!StrumAndKirk!#, p. 521], based on the number of real multiplies, predicts that the fft is faster starting at length , and that direct convolution is significantly faster for very short convolutions (e.g., 16 operations for a direct length-4 convolution, versus 176 for the fft function).
See [#!MDFT!#]9.4for further discussion of FFT algorithms and their applications.
In digital audio, FIR filters are often hundreds of taps long. For such filters, the FFT method is much faster than direct convolution in the time domain on single CPUs. On GPUs, FFT convolution is faster than direct convolution only for much longer FIR-filter lengths (in the thousands of taps [#!LauriEtAlAES11!#]); this is because massively parallel hardware can perform an algorithm (direct convolution) faster than a single CPU can perform an algorithm (FFT convolution).