In this monthly interview series, we turn the spotlight on members of the academic community — and the important research they do — as thought leaders, collaborators, and independent contributors.
This month, we’re spotlighting Yufei Ding, an assistant professor in computer science at the University of California, Santa Barbara (UCSB). Her research focuses on high-performance, energy-efficient, and high-fidelity programming frameworks for emerging technologies, such as quantum computing and deep learning.
Ding's research partnership with Meta has been advanced through a $1.5 million grant to UCSB’s Institute for Energy Efficiency (IEE). This funding aims to accelerate research into energy-efficient data centers and artificial intelligence, which is a part of Meta's commitment to reach net zero emissions across its value chain by 2030. In this interview, Ding talks about how her degrees in physics have influenced her career in computer science, as well as the research that Meta’s partnership with IEE has made possible.
Q: Tell us about your career journey. What brought you to your current research areas?
Yufei Ding: I received my BS and MS in physics, where I studied condensed matter physics and laser optics. My PhD was in computer science, and in 2017, I joined the computer science department at the University of California, Santa Barbara, as an assistant professor.
My research at UCSB is grounded in programming systems. Our work cuts across multiple programming system technologies, from program verification and testing, to high-level algorithmic optimization and autotuning, to domain-specific programming language designs, kernel library implementations, advanced compilation constructions, and computer architecture design. We use this as a foundation to benefit other fields, including machine learning and quantum computing.
Q: How has your training in physics translated to your career in computer science?
YD: Studying physics provided me with rigorous training in analytical problem formulation and solutions. There are also many philosophies in physics applicable to computer science.
For example, condensed matter physics can be considered a high-level abstraction — compared with lower-level particle physics — and new physics laws can emerge at higher abstracted levels. In computer science, we are working on high-level program optimization, where new optimization opportunities that are hard to leverage at the low level can naturally emerge at the high level. Overall, computer systems are also built with several abstract layers, and different problems are solved in these different layers.
Physics, which is a discipline of natural objects, and computer science, which is built upon artificial objects, are similar in this way — even if the actual objects and constraints are different.
Q: You’ve partnered with tech companies over the years. Why is it important for a researcher like yourself to work with industry leaders?
YD: Working with industry leaders brings many benefits. First, it creates new funding opportunities. For example, I’ve been a part of joint proposals submitted for federal grants, and these grantmakers typically prefer industry collaborations when selecting which proposals to fund.
Second, industry tech leaders provide cutting-edge research topics that do not naturally arise in the purely academic sphere. They often offer data or platforms that are hard to resource in academia. All of this increases the broad applicability of our research.
Q: What has your experience been, working with Meta?
YD: Today, we are working with Meta through UCSB’s Institute for Energy Efficiency. This collaboration spans several faculties and labs, and covers a wide range of topics, from high-level natural language processing and computer vision applications to low-level systems design and optimizations for deep learning. All these topics center around the grand challenge of deep learning recommendation model (DLRM) training and inference, which plays a vital role in Meta’s core business.
This collaboration also brings to UCSB advanced computing facilities, like Meta’s Zion platform. Through this work, our students have the opportunity to discuss real-world challenges and insights with Meta researchers, like Dheevatsa Mudigere, a Principal Research Scientist on Meta’s AI System Software/Hardware Co-design team, and Bharath Muthiah, Technical Lead in Meta’s Infrastructure Technology Sourcing team.
Q: Can you summarize the work you’re undertaking with Meta through UCSB IEE?
YD: Our work with Meta focuses on DLRMs, one of the most important deep learning–based applications deployed at Meta. Overall, we target key performance bottlenecks of the existing DLRM system design and improve the training efficiency of large-scale DLRM through an array of technical innovations.
Several projects are carried out under this collaboration. One is focused on the high memory capacity requirement of large-scale DLRMs. We propose EL-Rec, a resource-efficient DLRM training system that democratizes the training of large-scale DLRM on a single GPU, while maintaining high training efficiency. EL-Rec optimizes the tensor-training decomposition based on the key primitive of the embedding table. This new design allows a much smaller embedding table footprint while keeping high embedding lookup throughput. EL-Rec also integrates a pipeline training paradigm to mitigate the CPU-GPU data communication overhead when offloading large embedding tables to the CPU RAM. Our paper — “EL-Rec: Efficient large-scale recommendation model training via tensor-train embedding table” — was published at the 2022 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22).
We also have ongoing projects that tackle critical performance bottlenecks, such as the high inter-GPU communication overhead in large-scale distributed DLRMs, and the data stall caused by online preprocessing in the data storage and ingestion pipeline of DLRMs.
Q: What are you working on next?
YD: I believe there is a lot of work that we can do together on the recommendation model, including model architecture innovation toward efficiency (e.g., compression, model pruning, and adaptive sparsity), large-scale distributed recommendation model deployment (e.g., data center planning and scheduling, high-performance and fault-tolerant communication), and hardware/software co-design (e.g., energy-efficient hardware architecture design, in-storage computing, and in-network computing).
I would also love to explore the challenges of other emerging technologies, such as AR/VR, autonomous vehicles, and robotic systems.