Proposals should advance mathematical techniques with a clear, practical application to one area of interest in backend systems. The range of mathematical techniques includes, but is not limited to, the following:
- Convex and nonconvex optimization
- Stochastic control and optimization
- Graph algorithms
- Online algorithms
- Reinforcement learning and adaptive control
- Machine learning
- Dynamic programming
- Decisions under uncertainty
- Scheduling and assignment
- Hidden Markov models
- Monte Carlo and simulation methods
- Multi-agent learning and game theory
- Randomized search heuristics
- Semi-supervised and weak supervised learning
Areas of interest can be categorized as the following:
1. Data center and hardware operations
Both complex and large-scale decision-making arise in Facebook data center operations, such as strategic capacity planning, resource allocation, operations and scheduling (staffing, hardware, reactive and proactive maintenance), disaster readiness, and impact control. Solving such problems efficiently and near-optimally is crucial to the efficiency and reliability of Facebook infrastructure. Uncertainty (such as service demand, supply chain issues, and hardware failures) poses a significant additional layer of such challenges. Facebook is interested in research endeavors that combine optimization with probabilistic, statistical, and machine learning techniques to support decision-making under uncertainty in these large-scale settings.
2.Cloud computing and cluster management
Large-scale distributed computing infrastructure provides a range of interesting challenges in resource management and scheduling. From instrumenting interdependent microservices to planning and executing batch data and machine learning pipelines, there are several opportunities to improve the efficiency and scalability of these systems. Coordination among processes or machines — such as in the case of distributed training and federated learning — present complex multi-agent optimization problems, and various systems performance tuning tasks such as regional data placement, traffic routing, and cache admission and eviction policies benefit from online and reinforcement learning and other techniques.
3.Application tuning and optimization
Many infrastructure systems make automated decisions in real time. These decisions adapt system configurations to specific user environments to optimize performance, reliability, and efficiency. Videos should be played at the highest quality level that a user’s network condition can sustain without frequent rebuffering. Detecting when a long-running data pipeline or machine learning workflow is making slow progress or stuck gives the option of restarting these processes so that they can continue their progress with fewer wasted resources. The decision-making should be able to accurately predict the environment, select optimal alternatives under sometimes conflicting objectives, and adapt the decisions by observing from past actions and results.