Mixed Source Sound Field Translation for Virtual Binaural Application with Perceptual Validation

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)


Non-interactive and linear experiences like cinema film offer high quality surround sound audio to enhance immersion, however, the perspective is usually fixed to the recording microphone position. With the rise of virtual reality, there is a demand for recording and recreating real-world experiences that allow users to move throughout the reproduction. Sound field translation achieves this by building an equivalent environment of virtual sources to recreate the recording spatially. However, the technique remains to restrict the maximum distance a user can translate away from the recording microphone’s perspective due to the discrete sampling by commercial higher order microphones only being capable of recording an acoustic sweet-spot. In this paper, we propose a method for binaurally reproducing a microphone recording in a virtual application that allows the user to freely translate their body further beyond the recording position. The method incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment to maintain a perceptually accurate reproduction. We perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment. Compared to the planewave benchmark, the proposed method offers both improved source localizability and robustness to spectral distortions at translated listening positions. A cross-examination with numerical simulations demonstrated that the sparse expansion relaxes the inherent sweet-spot constraint, leading to the improved localizability for sparse environments. Additionally, the proposed method is seen to better reproduce the intensity and binaural room impulse response spectra of near-field environments, further supporting the perceptual results.

Featured Publications