Areas of interest include, but are not limited to, the following.
1. Applied Cryptography
Cryptographic techniques enable us to power existing and new use cases while providing strong levels of privacy protection for user data. We are interested in research that develops novel techniques, improves scalability of existing ones, or makes them easier to adopt. Areas of interest include, but are not limited to, the following:
- Private authentication: How can we leverage techniques such as anonymous credentials to enable authenticated but private communication between clients and servers efficiently and at scale? Are there new techniques for credential-based authentication that provide stronger privacy protections for the credential?
- Secure computation: Secure computation is a powerful primitive, but how can we make it scale? For example, we’re interested in private identity/contact matching protocols that scale to hundreds of millions of users, or private location services that can support features at Facebook scale.
- E2E encryption: How can we use new cryptographic techniques to improve transparency, security, integrity, and reliability of end-to-end encryption technology? Where do current techniques fall short? How should we handle transient errors and how can we make safe retry logic? How can users know that their peers’ encryption keys are correct?
- Practical implementation: We’re interested in tools and protocols that use simple and widely supported primitives. Fewer “new block ciphers,” more “formally verified drop-in library for X.”
- Record linkage/matching: Private set intersection and its extensions are powerful tools for independent parties to join data sets. How can we evaluate the quality of the match without revealing the underlying data sets, especially as matching conditions expand to multiple features and fuzzy logic? What quality metrics are useful, and what information leaks from those metrics?
- Economics of trust models: Can we build trust among a large group of participants in a secure computation, while only requiring a subset of non-colluding parties performing the computation? How do the incentives of different trust models trade off with the difference in computational and operational costs of secure computation schemes with different trust models?
2. Data policies and compliance
Honoring people’s privacy necessitates that we ensure all communication to consumers about data enables them to make informed decisions. Moreover, all data storage and data usage by developers must be restricted for the intended purpose. Areas of interest include, but are not limited to, the following:
- Deletion: How can we ensure that data is deleted correctly? Can we make infrastructure that automatically handles deletion? How about in data warehouses, which often don’t support point deletion queries?
- Automated understanding of privacy policies: What policy languages can express a broad range of regulatory concepts? What happens if the policy changes? Can they be human-readable as well as structured? How can they connect to data at runtime without sacrificing efficiency or developer experience?
- Data flow and lineage: In a general-purpose programming language, how can we build accurate maps of data flow? How can we best apply static analysis, dynamic analysis, symbolic execution, or other tools? How can we link up data flows across different components, languages, or platforms?
- Information flow control (IFC): How can policy and user consents propagate with data at scale in very complex data processing systems? How can we prevent label creep issues, i.e., data becoming too restrictive through data flows?
- Programming languages: Can modern, usable programming languages support static information flow control or lineage extraction? Can they be augmented to carry policy information along these flows?
- Privacy economics: How do we evaluate the operational cost of privacy controls?
- Measurement: How do we evaluate the cost of privacy failures? How do we demonstrate technical compliance with data policies?
- Scraping risk: How do we measure the risk of data leakage posed by our products?
3. Differential Privacy
Differential privacy (DP) has emerged as an industry standard in protecting the privacy of user data while enabling useful aggregate information to be derived for usability, reliability, and machine learning needs. We are interested in research to enable new algorithms, new architectures for deployment, and new models for privacy accounting. Areas of interest include, but are not limited to, the following:
- Making differential privacy practical: Can we extend accounting techniques to realistic query workloads on large analytics systems? Can we apply them to time-series data, or longitudinal analyses of privacy over time?
- Measuring risk: How can we measure the risks of privacy loss or identification? What is the real-world impact of correlation or other attacks?
- Extension to database management systems: How can we efficiently incorporate DP into database management systems?
- Efficient combination of DP with other PETs: How can we best combine techniques for protecting data during computation (e.g., MPC) and techniques for minimizing re-identification risk of the computation outcome (e.g., DP)?
- Understanding differentially private releases: Can we generate confidence intervals for DP releases to maximize utility or minimize compute? Can we build tools that clearly demonstrate the trade-offs between privacy and accuracy?
- Differential privacy in deep learning: Can we improve the utility and privacy trade-off in application of DP in machine learning, and in particular, deep learning? Are there new theoretical frameworks that can help with particular threat models?
4. Privacy in AI
As applications and research of AI continue to accelerate, it’s important for AI researchers and ML practitioners to access easy-to-use tools for mathematically rigorous privacy guarantees while retaining the strong performance and speed of these AI systems. Areas of interest include, but are not limited to, the following:
- Practical advancements for MPC-based model training: How can we extend modern training approaches such as neural architecture search to secure MPC? Is it possible to design cryptographic algorithms for superior performance on 32-bit machines? How should large (TB-sized) data sets be sharded for MPC-based training?
- Extensions to on-device model training: How to train more performant distributed or federated models without compromising the privacy or security that has motivated on-device training?
- Privacy leakage and attacks in deep learning: For both model training and model scoring in a secure environment (e.g., MPC, on-device FL), what information is leaked from model training and prediction?How can we minimize privacy leakage when integrating multiple (and distinct) cryptographic algorithms? How should we think about private release mechanisms in (honest-but-curious) secure MPC?
- Post-training data deletion: What approaches to removal of data from trained machine learning models are most efficient?