Cubrick: A Scalable Distributed MOLAP Database for Fast Analytics

41st International Conference on Very Large Databases (Ph.D Workshop)

Abstract

This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory database that enables real-time data analysis of large dynamic datasets. Cubrick has a strictly multidimensional data model composed of dimensions, dimensional hierarchies and metrics, supporting sub-second MOLAP operations such as slice and dice, roll-up and drill-down over terabytes of data. All data stored in Cubrick is chunked in every dimension and stored within containers called bricks in an unordered and sparse fashion, providing high data ingestion ratios and indexed access through every dimension. In this paper, we describe details about Cubrick’s internal data structures, distributed model, query execution engine and a few details about the current implementation. Finally, we present some experimental results found in a first Cubrick deployment inside Facebook.

Latest Publications

Boosted Dense Retriever

Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel

NAACL - 2022