Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Telepresence for virtual meetings has gained interest due to recent travel limitations and the new reality of working from home. However, current literature supporting real-world microphone arrays for realistic telepresence in audio is very limited. This paper investigates a scenario of a distant participant joining virtually a meeting between two dynamic participants. The audio signal processing chain (i) starts by recording using an array mounted on glasses, (ii) with initial processing providing direction-of-arrival estimation of a desired speaker using a direct-path dominance test robust to reverberation, combined with speaker separation for improved dynamic localization, (iii) followed by speech enhancement against interfering speakers and noise, (iv) and ends with applying binaural signal matching for headphone listening. This paper compares model-based processing to learning-based processing in both noisy and dynamic scenarios, and presents a novel processing using data from a real wearable array, studied by simulation and a listening test.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann