SUPERB: Speech Understanding and PERformance Benchmark



Using self-supervised learning methods to pre-train a network on large volumes of unlabeled data followed by fine-tuning for multiple downstream tasks has proven vital for advancing research in natural language representation learning. However, the speech processing community lacks a similar setup that systematically measures the quality of learned representations across a wide range of downstream speech applications. To bridge this gap, we introduce the Speech Understanding and Performance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of learned speech representations on ten speech processing tasks. We present a complete framework for learning and evaluating specialized prediction heads for each task given the pre-trained speech representations. Our results on many publicly-available self-supervised models demonstrate their generalization abilities to multiple speech tasks with limited supervised and minimal architecture changes. All the materials are open-sourced and reproducible in the s3prl toolkit to facilitate future research in speech representation learning. View code at: GitHub

Featured Publications