Publications - Meta Research

June 16, 2023

Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu

Paper

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

This paper presents Voicebox, the most versatile text-conditioned speech generative model at scale. Voicebox is trained on a text-guided speech infilling task, where the goal is to generate masked speech given its surrounding audio and text transcript.

Areas

Artificial Intelligence, Machine Learning, Natural Language Processing & Speech,

Paper

June 13, 2023

Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang

Paper

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

To address these problems and encourage re- search to develop NLU technologies in the privacy policy domain, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, to evaluate the privacy policy language understanding across six tasks, including text classification, question answering, semantic parsing, and named-entity recognition.

Areas

Natural Language Processing & Speech

Paper

June 4, 2023

Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli

Paper

Learning ASR Pathways: A Sparse Multilingual ASR Model

Neural network pruning compresses automatic speech recognition (ASR) models effectively. However, in multilingual ASR, language ...

Areas

Machine Learning, Natural Language Processing & Speech,

Paper

June 4, 2023

Maryam Fazel-Zarandi, Wei-Ning Hsu

Paper

Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech

This paper presents Cocktail HuBERT, a self-supervised learning framework that generalizes to mixture speech using a masked pseudo source separation objective.

Areas

Natural Language Processing & Speech

Paper

June 4, 2023

Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

Paper

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

End-to-end multilingual ASR has become more appealing because of several reasons such as simplifying the training and deployment process and positive performance ...

Areas

Machine Learning, Natural Language Processing & Speech,

Paper

June 4, 2023

Yuanbo Hou, Yun Wang, Wenwu Wang, Dick Botteldooren

Paper

GCT: Gated Contextual Transformer For Sequential Audio Tagging

We propose a new neural network architecture for the task of sequential audio tagging. "Sequential audio tagging" means we want to know what types of acoustic events (e.g. dog bark, car engine) occur in an audio recording, and in what order they occur.

Areas

Natural Language Processing & Speech

Paper

June 4, 2023

Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

Paper

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

To mitigate undersampling, our approach inflates the uncertainty lower bound and weights each loss component with their uncertainty, effectively compensating severely...

Areas

Natural Language Processing & Speech

Paper

June 3, 2023

Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu Maja Pantic

Paper

LA-VocE: Low-SNR Audio-visual Speech Enhancement Using Neural Vocoders

In this work, we propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audiovisual speech via a transformer-based architecture, and then converts...

Areas

Computer Vision, Natural Language Processing & Speech,

Paper

June 3, 2023

Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jachym Kolar, Stavros Petridis, Maja Pantic, Christian Fuegen

Paper

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Visual speech recognition (VSR), also known as lip reading, is the task of recognizing speech content based on visual lip movements. VSR has a wide range of applications in real-world scenarios such as helping the hearing- impaired perceive human speech and improving automatic speech recognition (ASR) in noisy environments.

Areas

Computer Vision, Natural Language Processing & Speech,

Paper

May 22, 2023

Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Paper

Scaling Speech Technology to 1,000+ Languages

we build a new dataset comprising a moderate amount of labeled data for 1,107 languages and another dataset of unlabeled speech in 3,809 languages (§3). We leverage ....

Areas

Natural Language Processing & Speech

Paper

Research

Research from Meta

All Publications