Our Core Systems engineers and researchers design, build, and deploy the foundation of Meta’s private cloud that powers Meta’s infrastructure and meets our business needs. Our work spans across the engineering spectrum of research, development, deployment, and production as we ensure that our systems run efficiently, reliably, and securely across millions of machines in tens of geo-replicated data center regions.
Core Systems performs state-of-the-art R&D in the area of distributed systems and architecture at a global scale. Billions of people rely on the services we build and manage to connect and communicate. Throughout the lifecycle of these distributed services, we encounter fundamental technical challenges in multiple areas, including cluster management, serverless computing, configuration management, global routing, deployment, fault tolerance, performance, reliability, scalability, service discovery, and storage systems.
We are hiring infrastructure engineers with 5+ years of experience to tackle exciting hyperscale challenges; please reach out to systemsresearch@fb.com.
Below are some publications that describe our technical work:
- Cluster management: Twine, OSDI 2020; RAS, SOSP 2021; Shard Manager, SOSP 2021
- Serverless: XFaaS, SOSP 2023
- Configuration management: Configerator, SOSP 2015
- Continuous deployment: Conveyor, OSDI 2023
- Service mesh and global routing: ServiceRouter, OSDI 2023
- Consensus protocol: Delos, OSDI 2020 Best Paper Award; Delos SOSP 2021
- Global capacity management: Flux, OSDI 2023
- Kernel: TMO, ASPLOS 2022 Best Paper Award; IOCost, ASPLOS 2022; Contiguitas, ISCA 2023 Best Paper Award
- Fault tolerance: Kraken, OSDI 2016; Maelstrom, OSDI 2018; Taiji, SOSP 2019
- Tracing: Canopy, SOSP 2017
- Data center power management: Dynamo, ISCA 2016
View our Publications for a list of all our published research.