Areas of interest include, but are not limited to, the following.
1. Learning and evaluation under uncertainty
- To protect people online and provide them with a meaningful experience, Facebook develops predictive models, which typically require training and evaluation data. Obtaining this data is oftentimes resource intensive, e.g., manual labeling, surveys and product interactions. It is also subject to noise and biases. We are interested in practical methodologies that address these challenges, estimate various biases, reduce the resources used for obtaining labels and produce calibrated estimations and predictions across cohorts of varying sizes.
2. Statistical models of complex social processes
- Facebook’s products help connect billions of people, and we often think of our products and systems as time-varying networks at scale. As these networks of connections evolve, various social processes also unfold on top of them: content spreads, social groups form and dissolve, people leverage their networks to organize events and support charitable causes, etc. Statistical models, both of the evolution of connection networks and of social processes on top of those networks, provide important input into Facebook’s efforts to design better products and to build safer and more meaningful communities. We invite proposals around the development, inference, and validation of such statistical models.
- Design and analysis of experiments – Facebook uses frameworks for randomized experiments to measure the benefits of the improvements we make to our products. We seek to maximize what we learn from these experiments by improving how they are designed and analyzed. We are interested in methodologies which allow us to extend or enhance the standard experimentation framework: variance reduction; measuring heterogenous or time-varying effects; estimating effects of many-valued or continuous-valued treatments; aggregating information across multiple related experiments; and correcting for selection bias when randomization is imperfect. We are also actively interested in research on adaptive experimentation such as Bayesian optimization and reinforcement learning.
3. Causal inference with observational data
- Often researchers at Facebook would like to answer causal questions even when it is not possible to conduct product tests. For instance, we may want to measure the effects of external events or understand the potential causes for anomalies that we observe in our data. Proposals in this area should improve our ability to suggest potential hypotheses for interesting phenomena or to credibly estimate the effects of known causes. Another application of interest is enabling FB to predict how key app performance and reliability metrics will change when upgrades are rolled out to the entire user base based on the treatment effect observed on a selected population during the test phase of the app. We are also interested in the related field of survey methodology – dealing with non-response or missing data.
4. Algorithmic Auditing
- Modeling and measuring feedback loop effects in ranking and recommender systems – ML systems make predictions that get reinforced by user feedback to obtain an accurate model of the user preferences. However, this feedback loop can influence user decisions, and narrow their interests, potentially resulting in outcomes that are suboptimal. We are interested in research that sheds light on identifying these feedback loops, and model the effects they may have on preferences amplification through theoretical or empirical techniques.
- Interpretability techniques for AI models – AI models have become increasingly complex, so it is important for both the AI practitioners and business stakeholders to have comprehensive evaluation and understanding of the AI algorithms. This can help simplify model development, and make sure we leverage AI responsibly. We are interested in interpretability techniques in (but not limited to) any of the following topics: feature attributions, aggregate attributions, feature interaction, accumulated local effects, and global/local surrogate.
5. Performance Regression Detection and Attribution
- In large-scale distributed systems, we often set up automated alerts to surface endpoints experiencing a sustained loss in performance, and investigate their underlying root cause by analyzing combinations and sub-partitions of dozens of potentially co-dependent factors gathered from both structured and unstructured data sets. We welcome submissions on improved statistical methods for monitoring and automation in this class of problems, ranging from detecting the origin node of a fault within a networked environment to tools which can aid in improving the efficiency of general root cause analysis investigations.
6. Forecasting for Aggregated Time Series
- Forecasting is a widely used tool for capacity planning, however it is frequently desirable to produce and analyze accurate forecasts at several layers of granularity. For example, we might want to forecast inbound traffic at the global, country, and regional level simultaneously. Emerging techniques in hierarchical forecasting and related domains offer new ways of producing forecasts which are not only consistent at different levels of aggregation, but also leverage latent information from covariance structures. We welcome submissions which extend our ability to forecast beyond well-known univariate time series methods.
7. Privacy-aware statistics for noisy, distributed data sets
- Statistical practice at Facebook can become more complex in the context of privacy-enhancing technologies. Differential privacy (DP) involves creating noisy datasets and frameworks such as federated analytics generate insights from distributed datasets. Developing appropriate statistical methods for these situations requires careful accounting for noise and can be limited by requirements for both secure and distributed computation. We are interested in research that improves the utility of noisy datasets produced via DP, as well as new statistical methods and algorithms for federated analytics.