Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL)
We introduce FAIRSEQ S2T, a FAIRSEQ (Ott et al., 2019) extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows FAIRSEQ’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. FAIRSEQ’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. FAIRSEQ S2T documentation and examples are available at https: //github.com/pytorch/fairseq/tree/ master/examples/speech_to_text.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann