SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Visual speech recognition (VSR), also known as lip reading, is the task of recognizing speech content based on visual lip movements. VSR has a wide range of applications in real-world scenarios such as helping the hearing- impaired perceive human speech and improving automatic speech recognition (ASR) in noisy environments.


Scaling Speech Technology to 1,000+ Languages

we build a new dataset comprising a moderate amount of labeled data for 1,107 languages and another dataset of unlabeled speech in 3,809 languages (§3). We leverage ....