This week, in collaboration with Cornell University, we are presenting our paper, Semi-Oblivious Traffic Engineering: The Road Not Taken at the 2018 USENIX Symposium on Networked Systems Design and Implementation (NSDI) conference. The goal of our work was taking a critical look at popular methods for traffic engineering and evaluating a unique, but rarely considered alternative, known as oblivious routing. This blog provides the motivation and an overview of our approach to simple and efficient traffic engineering in large scale networks that we call SMORE.
Typically, most methods used for traffic engineering in real-world are adaptive: they compute the network’s traffic matrix and update routing to satisfy performance goals, such as congestion and latency stretch. Among the downsides of adaptive methods are the high levels of routing update churn, and their reactive nature, which often makes response to sudden traffic changes delayed. Furthermore, depending on the optimization algorithm chosen (e.g. by solving Multi-Commodity Flow formulation), the problem of computing optimum routing may become very complex on large, realistic topologies, especially when the desired goal requires optimizing for multiple constraints – e.g. congestion and delay.
In contrast to adaptive methods, a class of algorithms has been developed in the past that computes the routing scheme without considering active traffic matrix. Such algorithms are known as oblivious, and for specific cases have been demonstrated to be `O(log(n))` competitive as compared to the optimum (MCF) solution, where `n` is the number of nodes in the graph. Best of all, computing such routing schemes requires only polynomial time complexity, which scales very well with network size. The output of an oblivious algorithm is a set of spanning trees in the graph, and the traffic is randomly distributed among all of the trees. Among multiple randomly generated tree, our algorithm can select those that satisfy the basic constraints on latency, which is straightforward to implement.
Clearly, purely oblivious algorithms could not be practical, given that conditions in real networks change constantly while the output of the algorithm is static. Recomputing the decomposition every time a change happens is computationally expensive and disruptive to network traffic flow as well. To handle that, SMORE proposes the approach that could be called semi-oblivious, where the traffic is distributed among the decomposition trees in weighted fashion, satisfying both reachability and congestion constraints. In SMORE, we solve a specific linear-programming (LP) problem to find optimum weight allocation among the trees. While LP formulations might be computationally complex, the fact that we are solving simpler problem as compared to full multi-commodity flow allows us to remain very efficient.
Figure 1: Semi-oblivious TE system model
Even though theoretically, semi-oblivious routing does not improve much over oblivious in terms of performance, in most practical cases and most practical traffic matrices it shows significant improvement. Compared to competing adaptive routing proposals from similar work in the field, SMORE is robust to rapid changes in demands and network failures, while allowing for highly efficient use of network resources.
Figure 9: Expected performance on LBN over half a week.
So why path not taken? Our work on SMORE is still seminal. To our knowledge, there is no practical implementation of semi-oblivious routing at scale. The result of the work, however, motivated and stimulated work on efficient routing schemes for Express Backbone – Facebook’s own long-haul network interconnecting its data-centers together. We are looking forward to leverage the learnings and further improve our production network reliability and efficiency.