Owl: Scale and Flexibility in Distribution of Hot Content

USENIX Symposium on Operating Systems Design and Implementation (OSDI)

Abstract

Owl provides high-fanout distribution of large data objects to hosts in Meta’s private cloud. Owl combines a decentralized data plane based on ephemeral peer-to-peer distribution trees with a centralized control plane in which tracker services maintain detailed metadata about peers, their cache state, and ongoing downloads. In Owl, peer nodes are simple state machines and centralized trackers decide from where each peer should fetch data, how they should retry on failure, and which data they should cache and evict. Owl trackers provide a highly-flexible and configurable policy interface that customizes and optimizes behavior for widely-varying distribution use cases. In contrast to prior assumptions about peer-to-peer distribution, Owl shows that centralizing the control plan is not a barrier to scalability: Owl distributes over 800 petabytes of data per day to millions of client processes. Owl improves download speeds by a factor of 2–3 over both BitTorrent and a prior decentralized static distribution tree used at Meta, while supporting 106 use cases that collectively employ 55 different distribution policies.

Read more about this project on Owl: Distributing content at Meta scale: https://engineering.fb.com/2022/07/14/data-infrastructure/owl-distributing-content-at-meta-scale/.

Featured Publications