Popularity Prediction for Social Media over Arbitrary Time Horizons
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
IEEE International Conference on Data Engineering (ICDE)
Presto is an open source distributed query engine that supports much of the SQL analytics workload at Facebook. Presto is designed to be adaptive, flexible, and extensible. It supports a wide variety of use cases with diverse characteristics. These range from user-facing reporting applications with sub-second latency requirements to multi-hour ETL jobs that aggregate or join terabytes of data. Presto’s Connector API allows plugins to provide a high performance I/O interface to dozens of data sources, including Hadoop data warehouses, RDBMSs, NoSQL systems, and stream processing systems. In this paper, we outline a selection of use cases that Presto supports at Facebook. We then describe its architecture and implementation, and call out features and performance optimizations that enable it to support these use cases. Finally, we present performance results that demonstrate the impact of our main design decisions.
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Max Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood
Jungdam Won, Deepak Gopinath, Jessica Hodgins