Applications closed

2022 Silent Data Corruptions at Scale request for proposals

About

Every day, billions of people connect with each other through Meta using services like Facebook, Instagram, WhatsApp, and Messenger. Meta’s services rely on fleets of servers in data centers across the globe, all running applications and delivering the performance the services need. However, silent data corruption, or data errors that go undetected by the larger system, remain a widespread challenge for large-scale infrastructure systems. This type of corruption can propagate across the stack and manifest as application-level problems. It can also result in data loss and require months to debug and resolve. Our teams enable and support hardware testing and large-scale experiments in Meta's data centers including detecting and remediating silent data corruptions on a scale of hundreds of thousands of machines.

Within this novel research domain, we identify research opportunities that range from architectural solutions to data corruption, to fleetwide testing strategies and distributed computing resiliency models, to software and library resiliency, to silicon level design, simulation and manufacturing approaches. Solutions could be cross-layered with proposals combining different domains within the above. This RFP is not limited to solutions specific to CPUs, but instead is pursuing all the components typically used within a server infrastructure.

To foster further innovation in this area, and to deepen our collaboration with academia, Meta is pleased to invite faculty to respond to this call for research proposals pertaining to the aforementioned topics. We anticipate awarding up to five awards, each in the $50,000 range. Payment will be made to the proposer's host university as an unrestricted gift.


Applications Are Currently CLosed

Application Timeline

Launch Date

February 14, 2022

Deadline

March 21, 2022 at 5:00pm AOE

Winners Announced

June 2022

Areas of Interest

We are soliciting proposals focusing on mitigation of silent data corruptions within internet applications due to hardware faults affecting the data center computing stack (from hardware to compilers to applications). Proposals could range from hardware and architectural level mitigations and design strategies to test architecture evolution to software resiliency for silent data corruption.

The example topics include the following:

1. Computer architecture approaches to handle silent data corruptions

  • Architectural solutions to handle and mitigate silent data corruptions like enhanced compute block ECC mechanisms
  • Self-test architectural blocks and modes like lockstep computing, checkpointing, and redundant computing evaluating compute cost and performance tradeoffs
  • Novel architectural solutions related to compute and memory error handling mechanisms including but not limited to enhancing traditional RAS architectures

2. Distributed computing solutions to Silent error propagation

  • Multi-machine computational resiliency models/solutions for silent error containment and propagation
  • Error detection capability across multiple subsystems
  • Distributed/fleet scale error containment and testing mechanisms
  • Self-test (for silent data corruption) distributed system architecture and recovery solutions

3. Service resiliency, software redundancy

  • Software-level solutions for silent error resiliency including redundancy and probabilistic algorithmic fault tolerance
  • Enabling corruption-resilient, general-purpose compute and data movement libraries
  • Real-time software-level detection and containment strategies due to silent corruptions, with evaluation towards compute cost and performance
  • Algorithmic data corruption recovery solutions from historical data corruptions

4. Silicon design

  • Silicon design and manufacturing strategies towards mitigation of silent data corruption
  • Advanced simulation, emulation, and testing strategies within silicon fabrication
  • Silicon testing coverage assessment and probabilistic evaluation of fault occurrence within silicon modules
  • Test routine development for manufacturing and fleet use cases for silent error detection
  • Degradation assessment and modeling for silicon modules

Requirements

Proposals should include

  • A summary of the project (one to two pages), in English, explaining the area of focus, a description of techniques, any relevant prior work, and a timeline with milestones and expected outcomes
  • A draft budget description (one page) including an approximate cost of the award and explanation of how funds would be spent
  • Curriculum vitae for all project participants
  • Organization details; this will include tax information and administrative contact details

Eligibility

  • The proposal must comply with applicable U.S. and international laws, regulations, and policies.
  • Applicants must be current full-time faculty at an accredited academic institution that awards research degrees to PhD students.
  • Applicants must be the Principal Investigator on any resulting award.
  • Meta cannot consider proposals submitted, prepared, or to be carried out by individuals residing in or affiliated with an academic institution located in a country or territory subject to comprehensive U.S. trade sanctions.
  • Government officials (excluding faculty and staff of public universities, to the extent they may be considered government officials), political figures, and politically affiliated businesses (all as determined by Meta in its sole discretion) are not eligible.

Additional Information

  • Award recipients will be listed on the Meta Research website and will also be invited to a workshop (TBD) to share their findings and insights. Finalists will also be listed in a public blog announcement. We encourage winners to openly publish any findings or insights from their work.
  • For additional questions related to this RFP, please email academicrelations@fb.com.

Frequently Asked Questions

Terms & Conditions

Meta’s decisions will be final in all matters relating to Meta RFP solicitations, including whether or not to grant an award and the interpretation of Meta RFP Terms and Conditions. By submitting a proposal, applicants affirm that they have read and agree to these Terms and Conditions.

  • Meta is authorized to evaluate proposals submitted under its RFPs, to consult with outside experts, as needed, in evaluating proposals, and to grant or deny awards using criteria determined by Meta to be appropriate and at Meta’s sole discretion. Meta’s decisions will be final in all matters relating to its RFPs, and applicants agree not to challenge any such decisions.
  • Meta will not be required to treat any part of a proposal as confidential or protected by copyright, and may use, edit, modify, copy, reproduce and distribute all or a portion of the proposal in any manner for the sole purposes of administering the Meta RFP website and evaluating the contents of the proposal.
  • Personal data submitted with a proposal, including name, mailing address, phone number, and email address of the applicant and other named researchers in the proposal may be collected, processed, stored and otherwise used by Meta for the purposes of administering Meta’s RFP website, evaluating the contents of the proposal, and as otherwise provided under Meta’s Privacy Policy.
  • Neither Meta nor the applicant is obligated to enter into a business transaction as a result of the proposal submission. Meta is under no obligation to review or consider the proposal.
  • Feedback provided in a proposal regarding Meta products or services will not be treated as confidential or protected by copyright, and Meta is free to use such feedback on an unrestricted basis with no compensation to the applicant. The submission of a proposal will not result in the transfer of ownership of any IP rights.
  • Applicants represent and warrant that they have authority to submit a proposal in connection with a Meta RFP and to grant the rights set forth herein on behalf of their organization. All awards provided by Meta in connection with this RFP shall be used only in accordance with applicable laws and shall not be used in any way, directly or indirectly, to facilitate any act that would constitute bribery or an illegal kickback, an illegal campaign contribution, or would otherwise violate any applicable anti-corruption or political activities law.
  • Awards granted in connection with RFP proposals will be subject to terms and conditions contained in the unrestricted gift agreement (or, in some cases, other mechanisms) pursuant to which the award funding will be provided. Applicants understand and acknowledge that they will need to agree to these terms and conditions to receive an award.
Stay Connected
Receive email notifications about our research awards