Teaser Image

Abstract

Accurate modeling of personalized head-related transfer functions (HRTFs) is difficult but critical for applications requiring spatial audio. However, this remains challenging as experimental measurements require specialized equipment, numerical simulations require accurate head geometries and robust solvers, and data-driven methods are hungry for data. In this paper, we propose a new deep learning method that combines measurements and numerical simulations to take the best of three worlds. By learning the residual difference and establishing a high quality spatial basis, our method achieves consistently 2 dB to 2.5 dB lower spectral distortion (SD) compared to the state-of-the-art methods.

Links

Datasets

We released a dataset that contains head meshes with five different resolutions.


















58 head meshes in the HUTUBS database are adaptively remeshed to five different mesh resolutions. It takes 10 ~ 15 minutes to generate five resolutions' head meshes for a subject's two ears. We use an OpenFlipper plug-in contained in MESH2HRTF to gradually change the mesh from the ipsilateral ear at a low resolution (0.3 / 0.5 / 1 / 2 / 4 mm) to the contralateral ear at a high resolution (3 / 5 / 10 / 20 / 40 mm).

All 58 subjects' head meshes: [all_mesh.zip (808MB)]

This zip file contains 58 folders, and each folder contains one subject's ten head meshes including five head mesh resolutions for both left and right ears.

For example, for subject 1, there is a folder "pp1" containing ten .ply files.

pp1_1to10mm_L.ply: The mesh resolution is 1 mm at left ear and gradually change to 10 mm at right ear.
pp1_1to10mm_R.ply: The mesh resolution is 1 mm at right ear and gradually change to 10 mm at left ear.
pp1_03to3mm_L.ply: The mesh resolution is 0.3 mm at left ear and gradually change to 3 mm at right ear.
pp1_03to3mm_R.ply: The mesh resolution is 0.3 mm at right ear and gradually change to 3 mm at left ear.



Each subject's ten mesh files could be directly downloaded from the table below.

58 head meshes in HUTUBS database with 5 different resolutions
[pp1.zip (14MB)]

[pp2.zip (14MB)]

[pp3.zip (14MB)]

[pp4.zip (14MB)]

[pp5.zip (17MB)]

[pp6.zip (14MB)]

[pp8.zip (17MB)]

[pp9.zip (15MB)]

[pp10.zip (14MB)]

[pp11.zip (16MB)]

[pp12.zip (15MB)]

[pp16.zip (14MB)]

[pp19.zip (16MB)]

[pp20.zip (16MB)]

[pp21.zip (14MB)]

[pp22.zip (14MB)]

[pp23.zip (14MB)]

[pp29.zip (13MB)]

[pp30.zip (15MB)]

[pp31.zip (13MB)]

[pp32.zip (15MB)]

[pp33.zip (14MB)]

[pp40.zip (16MB)]

[pp41.zip (16MB)]

[pp44.zip (13MB)]

[pp45.zip (14MB)]

[pp46.zip (13MB)]

[pp47.zip (18MB)]

[pp48.zip (14MB)]

[pp49.zip (15MB)]

[pp55.zip (15MB)]

[pp57.zip (12MB)]

[pp58.zip (15MB)]

[pp59.zip (14MB)]

[pp60.zip (9.8MB)]

[pp61.zip (15MB)]

[pp62.zip (14MB)]

[pp63.zip (12MB)]

[pp66.zip (16MB)]

[pp67.zip (16MB)]

[pp68.zip (14MB)]

[pp69.zip (15MB)]

[pp70.zip (15MB)]

[pp71.zip (16MB)]

[pp72.zip (15MB)]

[pp73.zip (14MB)]

[pp76.zip (14MB)]

[pp77.zip (13MB)]

[pp78.zip (16MB)]

[pp80.zip (15MB)]

[pp81.zip (12MB)]

[pp82.zip (14MB)]

[pp88.zip (14MB)]

[pp89.zip (14MB)]

[pp90.zip (15MB)]

[pp91.zip (12MB)]

[pp95.zip (14MB)]

[pp96.zip (14MB)]



We also release a small HRTF dataset that contains 58 subjects' simulated HRTFs in 440 directions.

All 58 subjects' simulated HRTFs: [440_Simulated_HRTFs.zip (154MB)]

The simulated HRTFs in HUTUBS database was calculated on a 1730 point Lebedev grid with a radius of 1.47m, while the measured HRTFs in HUTUBS database are in a full-spherical sampling grid with 440 points. Therefore, researchers need to simulate HRTFs in those 440 directions in order to carry out experiments.

To better facilitate the future HRTF research, we release the simulated HRTF data in the same 440 directions as those of the measured HRTFs in HUTUBS database.

This dataset contains 58 subjects' HRTFs simulated for frequencies between 100 Hz and 22 kHz in steps of 100 Hz using the boundary element method as implemented in MESH2HRTF. The complex pressure was calculated on a 440 point full-spherical sampling grid with a radius of 1.47m by assuming reciprocity, i.e., interchanging the positions of loudspeakers and microphones. HRTFs were simulated separately for the left and right ear, and the edge length of the meshes were gradually increased from 1 mm at the simulated ear to 10 mm at the opposite ear using the OpenFlipper plug-in contained in MESH2HRTF.

For example, for subject 1, there are two folders contains its left ear's and right ear's HRTFs respectively.

BEM_pp1_leftear: 440 directions' HRTFs simulated for subject 1's left ear.
BEM_pp1_rightear: 440 directions' HRTFs simulated for subject 1's right ear.

HRTFs could be loaded by running:

HRTFs = SOFAload('EvaluationGrid.sofa')





Convergence Analysis

We analyze resilience and error tolerance of our model. The errors are parametrized by performing a series of BEM simulations on 4 different frequencies and for 4 different mesh resolutions. The convergence results of this study that covers all 58 subjects in the dataset are shown below. As expected, the BEM solution converges when compared to BEM run on the highest mesh resolution (BEM-BEM). However, there is a persistent discrepancy when compared to measurement (BEM-Exp) because BEM actually converges to a wrong distribution. This explains why our results are not significantly improved even if highest quality BEM simulation is used (Ours-Exp) as BEM-Exp error remains similar. Therefore, for applications more tolerant on HRTF accuracy, using our method with lower mesh resolution will save a lot of processing time, since BEM typically scales quadratically with the number of mesh elements. It will be interesting for future work to look into this performance-accuracy tradeoff further and leverage lower resolution BEM input with learning-based corrections.

Perf

  • BEM-BEM: BEM errors against the finest BEM resolution
  • BEM-Exp: BEM errors against measurement
  • Ours-Exp: errors of our method against measurement
  • Line plot on the right shows the same errors average over frequencies
  • Video

    Citation

    bibtex
    Mengfan Zhang, Jui-Hsien Wang, and Doug L. James. 2019. Personalized HRTF modeling using DNN-augmented BEM. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 451-455.

    Acknowledgements

    We thank the anonymous reviewers for their constructive feedback. We thank the authors of the HUTUBS HRTF database, who provided high-resolution head meshes. We also thank the authors of Mesh2HRTF, who provided an open-source software to numerically calculate HRTFs.