This week at ACM SIGCOMM 2018 in Budapest, Hungary, we are sharing details on FBOSS, the system running on Facebook-designed network switches powering our global data centers. Our SIGCOMM paper, Building Switch Software at Facebook Scale, provides the details on how we design, implement and operate one of the world’s largest open source switch platforms at scale.
Every day, people and communities use Facebook’s infrastructure to share information, ranging from messages, news feed or posts, to images and videos. To support these user-facing features, there are numerous products and research projects that require a large amount of data to be processed and transferred between different groups of machines within Facebook. Over the years, we have built multiple data centers with a large amount of compute and network infrastructure. To set a context for how fast the network is growing, our FBOSS deployments in our data centers increased by 30x over a period of two years as seen below:
To support the fast growth of the network, we soon found that traditional methods of building and deploying switch software did not fully address our needs. Therefore, we decided to build and deploy our own switch hardware, Wedge and Backpack. Then we built an open source switch software called FBOSS, which stands for Facebook Open Switching System, to run on this hardware. Before we dive into the details of FBOSS, let’s first set the context on why we needed our own switching software.
One of the main technical challenges in running large networks is managing the complexity of excess networking features. Most switch vendors understandably try their best to build common software that can meet the needs of their entire customer base; thus their software includes the union of all features requested by all customers over the lifetime of the product. However, more features lead to more code, which can lead to increased bugs, security holes, operational complexity and downtime. We wanted to build software that implements only a carefully selected subset of networking features that we absolutely need.
Further, scaling a large network requires a high rate of innovation while maintaining network stability. Vendors prioritize changes and features by how well they correlate across all of their customers. We found instances where our feature needs did not correlate well across the vendors’ other customers.
Finally, another challenge we faced when using vendor switches and switch software is the difficulty in integrating the software into existing infrastructure. Facebook already has infrastructure in place for managing, monitoring and deploying general software services. However, since this software is built in-house at Facebook, switch vendors do not have full access to the code. Therefore, we had to spend additional effort to integrate vendor switch software with existing infrastructure.
These challenges motivated us to venture into the world of building our own open source switching software that can be built in an incremental fashion with easy integration with existing Facebook infrastructure. We have covered some basic concepts in our blog post before. Now let’s discuss some details of the FBOSS architecture. For more information, please check out the source in the FBOSS GitHub repo.
FBOSS was designed with the following two design principles in mind.
With the design principle in mind, we built FBOSS consisting of the following components. The components interact with one another as shown in the following diagram.
Let’s now dive into each component in detail. More details of each component are shared in our paper.
Switch software is conventionally developed and released by switch vendors to a large customer base. Therefore, a new release to the switch software can take months, with extended development and QA test cycles. In addition, given that software update cycles are infrequent, an update usually contains a large number of changes that can introduce new bugs that did not exist previously. In contrast, typical Facebook software deployment processes are much more frequent and thus contain a smaller set of changes per update. Furthermore, feature deployments are coupled with automated and incremental testing mechanisms to quickly check and fix bugs. Our outage records from network operational data of a representative month, shown in the figure below, show that about 40% of the outages were hardware-related and the other 60% were software-related. This led us to develop a suite of software that is responsible for testing and deploying features in an agile fashion.
Instead of using existing automatic software deployment framework like Chef or Jenkins, FBOSS employs its own deployment software called fbossdeploy, which is purpose-built to maintain a tighter feedback loop with existing external monitors such as Beringei and Scuba. Using fbossdeploy, FBOSS employs the practice of three-stage deployment, which includes testing on a subset of production switches with lower importance. The three stages of deployment are as follows.
We are continually striving to improve our testing and deployment infrastructure. We aim to increase the frequency of our feature deployment, while not negatively affecting our reliability.
As mentioned in the introduction, FBOSS is integrated into the existing network management system (Robotron). We now discuss the details of the integration.
Robotron is Facebook’s main network management system. It is responsible for generating, storing and disseminating configurations for FBOSS. Robotron contains the centralized configuration database, which FBOSS draws its configuration data from. The configuration of network devices is highly standardized in data center environments. Given a specific topology, each device is automatically configured by using templates and auto-generated configuration data. For example, the IP address configuration for a switch is determined by the type of the switch (e.g., ToR or aggregation), and its upstream/downstream neighbors in the cluster.
Once an active configuration has been generated and distributed, Robotron can instruct FBOSS to use different versions of the configuration. In order to quickly and safely change between different configurations, FBOSS stages all prior configurations in its own database. If there is a need to revert to a prior configuration, FBOSS can simply reuse the staged configurations. Robotron uses other monitoring infrastructure to store device states from FBOSS and makes decisions whether to use a certain version of the configuration.
In addition to managing configurations, Robotron monitors FBOSS operational states and performance via Thrift interfaces and Linux system logs. Traditionally, data center operators use standardized network management protocols, such as SNMP, to collect switch statistics, such as CPU/memory utilization, link load, packet loss and miscellaneous system health, from the vendor network devices. However, the Thrift interface on FBOSS allows us to define our own data collection specifications and change them whenever we need. Also, the Thrift monitoring system is faster and can be optimized to reduce collection time. Finally, Linux logs provide detailed lower-level logs for our engineers to use that allow them to further analyze and improve the system.
Data center networks are quickly evolving and growing at a rapid rate. Many large data center operators are building their own white-box switches and deploying their own software on them—and FBOSS is one such project. Overall, Facebook has taken a software-centric approach to the future of switch software, and by sharing our design and experiences, we hope that we influence upcoming changes in network systems in both industry and academia.
Many people in the networking team at Facebook have contributed to FBOSS over the years and toward this paper. In particular, Adam Simpkins, Tian Fang and Jasmeet Bagga are among the initial team who architected the FBOSS software design. Rob Sherwood, Alex Eckert, Boris Burkov, Saman Kazemkhani and Ying Zhang contributed heavily to the SIGCOMM paper.