Advances in Pre-Training Distributed Word Representations

Language Resources and Evaluation Conference (LREC)


Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.

Latest Publications

Log-structured Protocols in Delos

Mahesh Balakrishnan, Mihir Dharamshi, David Geraghty, Santosh Ghosh, Filip Gruszczynski, Jun Li, Jingming Liu, Suyog Mapara, Rajeev Nagar, Ivailo Nedelchev, Francois Richard, Chen Shen, Yee Jiun Song, Rounak Tibrewal, Vidhya Venkat, Ahmed Yossef, Ali Zaveri