Next-generation Data Infrastructure request for proposals

Applications closed

Next-generation Data Infrastructure request for proposals

About

All around the world, businesses and organizations are becoming increasingly data driven, products and services are built more and more around intelligence derived from data, and the need for reliable and efficient data storage and processing at a global scale is becoming even more critical. Modern data infrastructure architectures have emerged from years of evolution in analytical and transactional data systems, along with a continuous infusion of capabilities stemming from new use cases and new data processing paradigms. Tightly coupled data warehouses are being replaced by more flexible ecosystems built around low-cost globally available storage and open file formats; data science and machine learning workloads are increasingly sharing the same infrastructure as analytical workloads; transactional systems and key-value stores are exploring ways to preserve consistency, reliability, and performance while operating efficiently at global scale. Yet, despite all these efforts and progress, many challenges still remain as the data management community is seeking out the defining characteristics of next-generation data infrastructure.

Facebook has had a long history of making contributions to the data management space – Hive, Presto, RocksDB, MyRocks all being examples of innovative work that started within the company. The scale at which we run and the unique constraints of our workloads make many existing solutions infeasible and provide a perspective that leads to new ideas. As we continue to build and evolve our data infrastructure, we are focused on a number of problems. These range from techniques to optimize CPU usage (and thus power consumption during large scale query processing) to strategies to optimize physical layouts and data transfer bandwidth, and from techniques to address the challenges rising from data storage and processing across widely separated data centers to novel approaches in converging data wrangling, machine learning, and analytics. Since guaranteeing correctness is a key requirement for our data storage and processing systems, we also remain focused in systems for testing and verification. Despite the unique constraints of our workloads, a lot of these problems are common in the industry and we believe that there is a lot to be gained by collaborating with academia in this area.

To foster further innovation in this area, and to deepen our collaboration with academia, Facebook is pleased to invite faculty to respond to this call for research proposals pertaining to the aforementioned topics. We anticipate awarding a total of 10 awards, each in the $50,000 range. Payment will be made to the proposer’s host university as an unrestricted gift. In addition, PIs and Co-PIs on the winning proposals will be automatically granted access to CrowdTangle, a public insights tool from Facebook that makes it easy to follow, analyze, and report on what’s happening with public content on social media. Learn more about CrowdTangle here.

Award Recipients

University of Maryland, College Park

Daniel Abadi

Swiss Federal Institute of Technology Lausanne

Anastasia Ailamaki

The Ohio State University

Spyros Blanas

University of California, Berkeley

Natacha Crooks

University of Chicago

Haryadi S. Gunawi

ETH Zurich

Ana Klimovic

University of Wisconsin–Madison

Paraschos Koutris

Massachusetts Institute of Technology

Tim Kraska

Technische Universität Dresden

Wolfgang Lehner

University of California, Irvine

Faisal Nawab

Applications Are Currently CLosed

Application Timeline

Launch Date

April 19, 2021

Deadline

June 2, 2021

Winners Announced

July or August 2021

Areas of Interest

Areas of interest include, but are not limited to, the following:

1. Large scale query processing

Data processing at scale imposes substantial CPU and power challenges to Facebook’s data centers. We are interested in techniques that can optimize the usage of CPU during common data processing pipelines, including, but not limited to the following:

Advances in vectorized engines, vectorized operators, and fast data decoding and decompression
Code generation techniques to accelerate query execution
Novel query optimization strategies and adaptivity techniques
Innovations in processing of time series, semi and unstructured data sets and graph data

2. Physical layout and IO optimizations

Large scale decoupled data systems make heavy use of IO when transferring data from storage to compute nodes, and from permanent media to main memory. We are looking for innovative strategies and techniques that can reduce the amount of data transferred during data processing pipelines, including but not limited to the following:

Micro-layout: innovative file formats, data encoding techniques, column and row reordering, new compression algorithms, efficient representations for semi and unstructured data
Macro-layout: novel partitioning strategies and indexing structures to improve data pruning, kv structures that offer innovative ways of balancing Read/Write/Memory overhead for OLTP workloads, innovations on materialized views and virtual tables
Caching: new local and remote caching systems for hot blocks, novel eviction strategies and hierarchical caching techniques
Tuning: advanced systems able to optimize data physical layout based on changing workloads

3. Data management and processing at a global scale

Data storage and processing across widely separated data centers presents a different set of challenges. We are interested in techniques that look to address problems caused by increased latency, resource constraints such as network bottlenecks, and heterogeneous hardware. Areas include but not limited to the following:

Global replication, transaction management, and consistency for OLTP use cases
Global data placement algorithms that balance cost concerns with latency requirements
Resource management algorithms that balance compute allocation for data processing workloads on a global scale

4. Converged architectures for data wrangling, machine learning, and analytics

Decoupling compute from storage and using low-cost storage based on open file formats to store from raw to fully curated data and for a wide variety of use cases has led to the need to rethink many areas of data management, including but not limited to the following:

Data modeling, data lineage, and data governance at scale and for complex workflows
Systems, languages, and APIs for expressing and executing efficiently complex business logic and data transformations
Systems and techniques to bring closer together analytical, data science, and machine learning workloads

5. Advances in testing and verification for storage and processing systems

Guaranteeing correctness is a key requirement for our data storage and processing systems. We are looking for advances in systems to test and verify that these systems perform correctly and within spec when change (e.g., new code, faults, new hardware) is introduced. Areas include but not limited to the following:

Randomized and fuzzy testing for storage (key value, relational) and compute (e.g., SQL)
Fault injection and chaos testing
Formal verification techniques for distributed algorithms

Requirements

Proposals should include

A summary of the project (1–2 pages), in English, explaining the area of focus, a description of techniques, any relevant prior work, and a timeline with milestones and expected outcomes
A draft budget description (1 page) including an approximate cost of the award and explanation of how funds would be spent
Curriculum Vitae for all project participants
Organization details; this will include tax information and administrative contact details

Eligibility

Proposal must comply with applicable U.S. and international laws, regulations and policies.
Applicants must be current full-time faculty at an accredited academic institution that awards research degrees to PhD students.
Applicants must be the Principal Investigator on any resulting award.
Facebook cannot consider proposals submitted, prepared or to be carried out by individuals residing in, or affiliated with an academic institution located in, a country or territory subject to comprehensive U.S. trade sanctions.
Government officials (excluding faculty and staff of public universities, to the extent they may be considered government officials), political figures, and politically affiliated businesses (all as determined by Facebook in its sole discretion) are not eligible.

Frequently Asked Questions

Do you typically limit the salary of the PI in the gift?
Most of the RFP awards are an unrestricted gift. Because of its nature, salary/headcount could be included as part of the budget presented for the RFP. Since the award/gift is paid to the university, they will be able to allocate the funds to that winning project and have the freedom to use as they need. All Facebook teams are different and have different expectations concerning deliverables, timing, etc. Long story short – yes, money for salary/headcount can be included. It’s up to the reviewing team to determine if the percentage spend is reasonable and how that relates to the decision if the project is a winner or not.
Should the proposal be double- or single-spaced? Is there any required/expected font?
We are flexible, but ideally proposals submitted are single-spaced, Times New Roman, 12 pt font.
What is the award cycle or when does the funding year begin and end?
Research awards are given year-round and funding years/duration can vary by proposal.
Can award funds be used to cover a researcher's summer salary while conducting research?
Yes, award funds can be used to cover a researcher’s salary.
Can you please explain the budget breakdown in more detail?
Budgets can vary by institution and geography, but overall research funds ideally cover the following: graduate or post-graduate students’ employment/tuition; other research costs (e.g., equipment, laptops, incidental costs); travel associated with the research (conferences, workshops, summits, etc.); overhead for research gifts is limited to 5%
We are working as co-PIs and are at the same institution. Is it possible to list both of our names as PI for an RFP proposal?
One person will need to be the primary PI (i.e., the submitter that will receive all email notifications); however, you’ll be given the opportunity to list collaborators/co-PIs in the submission form. Please note in your budget breakdown how the funds should be dispersed amongst PIs.

Terms & Conditions

Facebook’s decisions will be final in all matters relating to Facebook RFP solicitations, including whether or not to grant an award and the interpretation of Facebook RFP Terms and Conditions. By submitting a proposal, applicants affirm that they have read and agree to these Terms and Conditions.

Facebook is authorized to evaluate proposals submitted under its RFPs, to consult with outside experts, as needed, in evaluating proposals, and to grant or deny awards using criteria determined by Facebook to be appropriate and at Facebook’s sole discretion. Facebook’s decisions will be final in all matters relating to its RFPs, and applicants agree not to challenge any such decisions.
Facebook will not be required to treat any part of a proposal as confidential or protected by copyright, and may use, edit, modify, copy, reproduce and distribute all or a portion of the proposal in any manner for the sole purposes of administering the Facebook RFP website and evaluating the contents of the proposal.
Personal data submitted with a proposal, including name, mailing address, phone number, and email address of the applicant and other named researchers in the proposal may be collected, processed, stored and otherwise used by Facebook for the purposes of administering Facebook’s RFP website, evaluating the contents of the proposal, and as otherwise provided under Facebook’s Privacy Policy.
Neither Facebook nor the applicant is obligated to enter into a business transaction as a result of the proposal submission. Facebook is under no obligation to review or consider the proposal.
Feedback provided in a proposal regarding Facebook products or services will not be treated as confidential or protected by copyright, and Facebook is free to use such feedback on an unrestricted basis with no compensation to the applicant. The submission of a proposal will not result in the transfer of ownership of any IP rights.
Applicants represent and warrant that they have authority to submit a proposal in connection with a Facebook RFP and to grant the rights set forth herein on behalf of their organization. All awards provided by Facebook in connection with this RFP shall be used only in accordance with applicable laws and shall not be used in any way, directly or indirectly, to facilitate any act that would constitute bribery or an illegal kickback, an illegal campaign contribution, or would otherwise violate any applicable anti-corruption or political activities law.
Awards granted in connection with RFP proposals will be subject to terms and conditions contained in the unrestricted gift agreement (or, in some cases, other mechanisms) pursuant to which the award funding will be provided. Applicants understand and acknowledge that they will need to agree to these terms and conditions to receive an award.