In an -th order FDN, the core computations consist of updates of the delay lines and a matrix by vector multiplication. The delay line operations can proceed in parallel. The matrix by vector multiplication requires in general operations (multiplications and additions). If the matrices arise from the scattering coefficients of a waveguide junction, the computations reduce to . The same order of complexity is required by the normalized waveguide junction. For the special case of a junction of equal-impedance waveguides, the multiplications can be replaced by shifts when is a power of . In all these efficient cases the eigenvalues of the feedback matrix are constrained to be at +1 or -1. The circulant matrix offers a more general eigenvalue distribution. Moreover, the matrix by vector multiplication can be implemented very efficiently in hardware. This multiplication can be viewed as a circular convolution of the column vector with the first row of the matrix. Such a convolution can be performed, when is a power of , using two FFTs (one of which can be precomputed), an elementwise product between two -vectors, and an inverse FFT. The complexity of this algorithm is . It is easy to implement this matrix-vector product in VLSI by means of the butterfly or other hypercubic architectures . These architectures allow computations of the FFT in time steps, and the algorithm can be pipelined.
The parallel implementation of waveguide scattering matrices cannot be done in less than time steps, because of the scalar product that is involved in Householder reflections of any kind. Hence, in parallel implementations we lose the advantage of waveguide scattering matrices over circulant matrices.
Moreover, we can use number-theoretic Fourier transforms  in order to compute the circular convolution. Such transforms work over commutative rings, and can be arranged in such a way that all multiplications are replaced by shifts. Since in the convolution we have both the direct and inverse transforms, the overall result remains correct.
The circulant structure of is advantageous for purposes of real-time control as well. In the first place, the entire matrix is determined by one of its rows or columns, and no matter how a row or column is modified, as long as the rest of the matrix is modified accordingly to preserve the circulant structure, the matrix will have unit-modulus eigenvalues as needed for losslessness. Furthermore, the top row of a circulant matrix is obtained from its eigenvalues by means of an inverse DFT. Therefore, it is possible to efficiently generate a continuous family of circulant matrices by continuously varying the complex phases of the eigenvalues. Moreover, if the matrix-vector multiplication is implemented in the frequency domain, the inverse DFT is not needed. Thus, we may move the eigenvalues to arbitrary points on the unit circle and generate a wide family of efficiently computed lossless feedback matrices.