Facebook Workshop at MICRO 2021

Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop

Half-day virtual workshop at MICRO 2021

Friday, October 22, 2021
10:00 am – 1:00 pm (EDT)

Workshop Video: https://youtu.be/aXLqyqpLJpo.

Many platforms, from data centers to edge devices, deploy multiple DNNs to provide high-quality results for diverse applications in the domains of computer vision, speech, language, and so on. In addition, because some applications rely on multi-DNN pipelines performing different tasks, DNN workloads are becoming more diverse and heterogeneous. Supporting inferences of such diverse and heterogeneous DNN workloads with high energy efficiency and low latency is becoming a great challenge since the major approach to achieve inference efficiency is to specialize architecture, compiler, and systems for a small set of target DNN models within the same domain. In addition to such performance and efficiency, multimodel DNN workloads introduce new challenges such as security and privacy. Therefore, this workshop will explore innovations to efficiently and safely support multi-model DNN workloads from each of the three areas: architecture, compiler, and system. We solicit papers in the research fields listed below.

Invited Speakers

Vijay Janapa Reddi
Associate Professor, Harvard University
VP and founding member, MLCommons

Title: Democratizing TinyML: Generalization, Standardization and Automation

Tiny machine learning (TinyML) is a fast-growing field at the intersection of ML algorithms and low-cost embedded systems. TinyML enables on-device analysis of sensor data (vision, audio, IMU, etc.) at ultra-low-power consumption (<1mW). Processing data close to the sensor allows for an expansive new variety of always-on ML use-cases that preserve bandwidth, latency, and energy while improving responsiveness and maintaining privacy. This talk introduces the vision behind TinyML and showcases some of the exciting applications that TinyML is enabling in the field, from wildlife conservation to supporting public health initiatives. Yet, there are numerous technical challenges to address. Tight memory and storage constraints, MCU heterogeneity, software fragmentation and a lack of relevant large-scale datasets pose a substantial barrier to developing TinyML applications. To this end, the talk touches upon some of the research opportunities for unlocking the full potential of TinyML.

Vijay Janapa Reddi is an Associate Professor at Harvard University, VP and a founding member of MLCommons, a nonprofit organization aiming to accelerate machine learning (ML) innovation for everyone. He also serves on the MLCommons board of directors and is a Co-Chair of the MLCommons Research organization. He co-chaired the MLPerf Inference ML benchmark for datacenter, edge, mobile and IoT systems. Before joining Harvard, he was an Associate Professor at The University of Texas at Austin in the Electrical and Computer Engineering department. His research sits at the intersection of machine learning, computer architecture and runtime software. He specializes in building computing systems for tiny IoT devices, as well as mobile and edge computing. Dr. Janapa-Reddi is a recipient of multiple honors and awards, including the National Academy of Engineering (NAE) Gilbreth Lecturer Honor (2016), IEEE TCCA Young Computer Architect Award (2016), Intel Early Career Award (2013), Google Faculty Research Awards (2012, 2013, 2015, 2017, 2020), Best Papers at the 2020 Design Automation Conference (DAC), 2005 International Symposium on Microarchitecture (MICRO), 2009 International Symposium on High-Performance Computer Architecture (HPCA), IEEE’s Top Picks in Computer Architecture awards (2006, 2010, 2011, 2016, 2017, 2021). He has been inducted into the MICRO and HPCA Hall of Fame (in 2018 and 2019, respectively). He is passionate about widening access to applied machine learning for STEM, Diversity, and using AI for social good. He designed the Tiny Machine Learning (TinyML) series on edX, a massive open online course (MOOC) that sits at the intersection of embedded systems and ML that thousands of global learners can access and audit free of cost. He was also responsible for the Austin Hands-on Computer Science (HaCS) deployed in the Austin Independent School District for K-12 CS education. Dr. Janapa-Reddi received a Ph.D. in computer science from Harvard University, an M.S. from the University of Colorado at Boulder and a B.S from Santa Clara University.

Bita Darvish Rouhani
Principal Research Manager, Microsoft

Title: The Rise and Future of AI Supercomputing

AI is becoming a key enabler in our ever-growing data-driven world. We are already witnessing how AI at scale is helping to improve agricultural sustainability and answer lingering questions in protein folding and drug discovery. Building a sustainable AI ecosystem and democratizing AI, however, requires challenging the status quo in computing and reinventing the full stack from algorithm and software to power management and hardware. In this talk, I will discuss Project Brainwave, a production-scale system for real-time and low-cost inferencing of deep neural networks. Project Brainwave is developed based on a balanced co-optimization of AI algorithms, software, and hardware stacks. Project Brainwave is used today to serve millions of users by empowering major online scenarios such as web search, question-answering, and image processing. I will conclude my talk by highlighting the existing efficiency gap between the current AI paradigm and human neocortex and the opportunities for co-evolution of hardware, software, and algorithms to reduce such a gap.

Bita Rouhani is a Principal research manager at Microsoft Azure Cloud Accelerated Systems & Technologies. Bita received her Ph.D. in Computer Engineering from the University of California San Diego. Her research interest includes algorithm, hardware, and software co-design for succinct and assured deep learning. Bita has co-authored 40+ major patents and publications. Her work has been published at top-tier machine learning, computer architecture, and security venues including NeurIPS, ISCA, ASPLOS, ISLPED, DAC, ICCAD, FPGA, FCCM, SIGMETRICS, and S&P magazine.


All times are in EDT.

FridayOct. 22

10:00 - 10:20 AM

  • Opening Keynote: On-deviceAI team @ Facebook Reality Labs

10:20 - 10:50 AM

Paper Session I: Embedded devices

  • Mingoo Ji (Kookmin Univ), Saehanseul Yi (UCI) , Jong-Chan Kim (Kookmin Univ), Nikil Dutt (UCI), “Demand Layering for Accommodating Multiple Neural Networks in Memory-Constrained Embedded Systems” Download Paper
  • Sanghoon Kang, Hoi-Jun Yoo (KAIST), “Resource-Aware Spatial Co-Location for Multi-DNN Acceleration on Mobile Devices” Download Paper

10:50 - 11:25 AM

Keynote I: Vijay Janapa Reddi (Harvard)

  • Democratizing TinyML: Generalization, Standardization and Automation

11:25 - 11:35 AM


11:35 - 12:20 AM

Paper Session II: Data centers

  • Sungyeob Yoo, Jung-Hoon Kim, Joo-Young Kim (KAIST), “A Heterogeneous Vector-Array Architecture with Resource Scheduling for Multi-User/Multi-DNN Workloads” Download Paper
  • Shulin Zeng, Guohao Dai, Niansong Zhang, Yu Wang, “Enabling Fast Deployment and Efficient Scheduling for Multi-Node and Multi-Tenant DNN Accelerators in the Cloud” Download Paper
  • Soroush Ghodrati, Hadi Esmaeilzadeh (UCSD), “Multi-Tenancy: The Next Step in DNN Acceleration” Download Paper

12:20 - 12:55 PM

Keynote II: Bita Darvish Rouhani (Microsoft)

  • “The Rise and Future of AI Supercomputing”

12:55 - 1:00 PM

Closing Remark

Topics of Interest

  • Heterogeneous hardware architectures to concurrent execution of multiple DNN models
  • Reconfigurable accelerator architectures (CGRA-style, FPGA, etc.) to adapt to different DNN models
  • Compiler runtime for multi-DNN workloads on heterogeneous or reconfigurable hardware architectures
  • End-to-end compilation flow and optimization techniques targeting multi-DNN workloads on heterogeneous or reconfigurable hardware architectures
  • Design automation tools for heterogeneous or reconfigurable hardware architectures targeting multi-DNN workloads
  • Techniques in each of architecture, compiler, and system domains to enhance security and privacy when a platform runs a multi-model DNN workload in multi-tenant style

Author Information

  • Submission site here
  • Template: Please use the MICRO 2021 template for submissions
  • Page limit: 4 pages (not including references)


  • Tushar Krishna (Georgia Institute of Technology)
  • Liangzhen Lai (Facebook)
  • Yu-Hsin Chen (Facebook)
  • Hyoukjun Kwon (Facebook)

Important Dates

  • Submission deadline: Sep 22, 2021 (AOE)
  • Extended submission deadline: Sep 25, 2021 (AOE)
  • Notification: October 13, 2021 (AOE)
  • Camera-ready: October 20, 2021 (AOE)
  • Workshop date: October 22, 2021 (10:00 am – 1:00 pm EDT)

Program Committee

  • Hsien-Hsin Sean Lee (Facebook)
  • Jangwoo Kim (SNU)
  • Zhiru Zhang (Cornell)
  • Amir Yazdanbakhsh (Google)
  • Minsoo Rhu (KAIST)
  • Jongse Park (KAIST)
  • Hardik Sharma (Google)
  • Jie Wang (AWS)
  • Joel Hestness (Cerebras Systems)
  • Divya Mahajan (Microsoft Research)

For further information, please email Hyoukjun Kwon (hyoukjunkwon@fb.com).

Learn More About Facebook Research
To learn more about Facebook academic engagements, research initiatives, and people, visit our blog.