June 16, 2023
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu

This paper presents Voicebox, the most versatile text-conditioned speech generative model at scale. Voicebox is trained on a text-guided speech infilling task, where the goal is to generate masked speech given its surrounding audio and text transcript.

June 13, 2023
Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang

To address these problems and encourage re- search to develop NLU technologies in the privacy policy domain, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, to evaluate the privacy policy language understanding across six tasks, including text classification, question answering, semantic parsing, and named-entity recognition.

June 3, 2023
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jachym Kolar, Stavros Petridis, Maja Pantic, Christian Fuegen

Visual speech recognition (VSR), also known as lip reading, is the task of recognizing speech content based on visual lip movements. VSR has a wide range of applications in real-world scenarios such as helping the hearing- impaired perceive human speech and improving automatic speech recognition (ASR) in noisy environments.

May 22, 2023
Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

we build a new dataset comprising a moderate amount of labeled data for 1,107 languages and another dataset of unlabeled speech in 3,809 languages (§3). We leverage ....