Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

EMNLP Conference on Machine Translation (WMT)


We present the results of the first task on Large-Scale Multilingual Machine Translation. The task consists on the many-to-many evaluation of a single model across a variety of source and target languages. This year, the task consisted on three different settings: (i) SMALLTASK1 (Central/South-Eastern European Languages), (ii) the SMALL-TASK2 (South East Asian Languages), and (iii) FULL-TASK (all 101 x 100 language pairs). All the tasks used the FLORES-101 dataset as the evaluation benchmark. To ensure the longevity of the dataset, the test sets were not publicly released and the models were evaluated in a controlled environment on Dynabench.There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models. This year’s result show a significant improvement over the known baselines with +17.8 BLEU for SMALL-TASK2, +10.6 for FULL-TASK and +9.4 for SMALLTASK1.

Featured Publications