Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
The paper proposes an efficient, robust, and reconfigurable technique to suppress various types of noises for any sampling rate. The theoretical analyses, subjective and objective test results show that the proposed noise suppression (NS) solution significantly enhances the speech transmission index (STI), speech intelligibility (SI), signal-to-noise ratio (SNR), and subjective listening experience. The STI and SI consists of 5 levels, i.e., bad, poor, fair, good, and excellent. The most common noisy condition is of SNR ranging from -5 to 8 dB. For the input SNR between -5 and 2.5 dB, the proposed NS improves the STI and SI from “fair” to “good”. For the input SNR between 2.5 to 8 dB, the STI and SI are improved from “good” to “excellent” by the proposed NS. The proposed NS can be adopted in both capture and playback paths for voice over internet protocol, voice-trigger, and automatic speech recognition applications.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann