Popularity Prediction for Social Media over Arbitrary Time Horizons
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
VLDB 2016
This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory DBMS suited for interactive analytics over highly dynamic datasets. Cubrick has a strictly multidimensional data model composed of cubes, dimensions and metrics, supporting sub-second OLAP operations such as slice and dice, roll-up and drill-down over terabytes of data. All data stored in Cubrick is range partitioned by every dimension and stored within containers called bricks in an unordered and sparse fashion, providing high data ingestion rates and indexed access through any combination of dimensions. In this paper, we describe details about Cubrick’s internal data structures, distributed model, query execution engine and a few details about the current implementation. Finally, we present results from a thorough experimental evaluation that leveraged datasets and queries collected from a few internal Cubrick deployments at Facebook.
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Liqi Yan, Qifan Wang, Yiming Cu, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu
Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel