Shared Foundations: Modernizing Meta’s Data Lakehouse

Conference on Innovative Data Systems Research (CIDR)

Abstract

Data processing systems have evolved significantly over the last decade, driven by large trends in hardware and software, the exponential growth of data, and new and changing use cases. At Meta (and elsewhere), the various data systems composing the data lakehouse had historically evolved organically and independently, leading to data stack fragmentation, and resulting in work duplication, subpar system performance, and inconsistent user experience. This paper describes how we transformed the legacy data lakehouse stack at Meta to adapt to the new realities through a large cross-organizational effort called Shared Foundations. This program promotes a compositional approach based on the principles of reusable components, deduplicated systems, and common and consistent APIs. The Shared Foundations effort has resulted in a more modern data architecture at Meta – one that offers better performance, richer features, higher engineering velocity, and a more consistent user experience, setting up the data lakehouse stack at Meta for faster innovation in the future.

Featured Publications