Half-day workshop at ISCA 2022
Saturday, June 18, 2022
8:45 AM - 12:00 PM (EDT)
Many platforms, from data centers to edge devices, deploy multiple DNNs to provide high-quality results for diverse applications in the domains of computer vision, speech, language, and so on. In addition, because some applications rely on multi-DNN pipelines performing different tasks, DNN workloads are becoming more diverse and heterogeneous. Supporting inferences of such diverse and heterogeneous DNN workloads with high energy efficiency and low latency is becoming a great challenge since the major approach to achieve inference efficiency is to specialize architecture, compiler, and systems for a small set of target DNN models within the same domain. In addition to such performance and efficiency, multimodel DNN workloads introduce new challenges such as security and privacy. Therefore, this workshop will explore innovations to efficiently and safely support multi-model DNN workloads from each of the three areas: architecture, compiler, and system. We solicit papers in the research fields listed below.
The University of Texas at Austin
Title: Multi-model and Multi-modal Learning for EdgeAI
EdgeAI aims at the widespread deployment of AI on edge devices. Given the estimated one trillion IoT devices expected by 2035, such a rapid growth necessitates new breakthroughs in ML/AI in order to fully exploit the compute power of these devices. Indeed, a critical requirement of future ML systems is to enable on-device automated training and inference in distributed settings, where and when data, devices, or users are present, without sending training (possibly sensitive) data to the cloud or incurring long response times.
Starting from these overarching considerations, we focus on exploiting multi-model and multi-modal capabilities of deep neural networks to learn on resource constrained devices targeting IoT applications. To this end, we discuss dynamic and heterogenous models for distributed learning that need to account for hardware constraints (e.g., latency, power, memory, bandwidth), in addition to data and system heterogeneity. We use real life applications (e.g., depth estimation, semantic segmentation) to demonstrate these ideas in practice. We hope to capture the excitement this problem space brings to various topics in ML/AI, as well as optimization, communications, and application-hardware (co-)design.
Radu Marculescu is a Professor and the Laura Jennings Turner Chair in Engineering in the Department of Electrical and Computer Engineering at The University of Texas at Austin. Between 2000-2019, he was a Professor in the Electrical and Computer Engineering department at Carnegie Mellon University. His current research focuses on developing ML/AI methods and tools for modeling, analysis, and optimization of embedded systems, cyber-physical systems, social networks, and Internet-of-Things. He has received the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, for seminal contributions to the science of network on chip design, analysis, and optimization. Most recently, he received the 2020 ESWEEK Test-of-Time Award from The International Conference on Hardware/Software Co-Design and System Synthesis (CODES). He is a Fellow of the IEEE.
Stylianos I. Venieris
Title: Multi-DNN Accelerators: Architecting the Next-Generation AI Systems
Multi-deep neural network (DNN) workloads have started to emerge in various forms; by looking at either multi-user cloud AI services, complex multi-model pipelines or concurrent AI-powered tasks running on a mobile robot or a smartphone, the next generation of AI systems will have multi-DNN workloads at their core. In this talk, we argue that new breakthroughs in the computer architecture front are needed, giving rise to the topic of multi-DNN accelerator design. We'll start by drawing the line between single- and multi-DNN accelerators, by examining the new workloads that challenge our current hardware and presenting the disparate objectives of multi-DNN accelerators. Next, we'll go through the landscape of multi-DNN hardware, with special focus on multi-DNN-specific trade-offs such as time-multiplexing vs spatial co-location, static vs dynamic scheduling, and customisation to the workload vs programmability. Finally, we'll discuss how techniques, such as cross-DNN weights sharing, dynamic DNNs and NAS-based multi-DNN model-hardware co-design, are key drivers towards performant and efficient multi-DNN accelerators.
Stylianos I. Venieris is currently a Senior Research Scientist at Samsung AI, Cambridge, UK, where he leads the Distributed AI group. He received his PhD in Reconfigurable Computing and Deep Learning from Imperial College London in 2018 and his MEng in EEE from Imperial College London in 2014. His research interests include principled methodologies for the mapping of deep learning algorithms on distributed and mobile platforms, the design of novel end-to-end deep learning systems that robustly meet multi-objective performance requirements, and the design of next-generation hardware accelerators for the high-performance, energy-efficient deployment of deep neural networks.
Yingyan (Celine) Lin
Title: Alleviating DNN workload diversity via Merging Compact Convolutional Layers into Dense Ones
As deep neural network (DNN) workloads are becoming more diverse and heterogeneous driven by various emerging application needs, it has imposed a major challenge in accelerating DNNs with satisfying efficiency due to the associated difficulty to enhance specialization. In this talk, we will introduce an algorithmic technique that can alleviate DNN workload diversity by merging compact convolutional layers typically adopted in efficient DNN models to dense ones without hurting the model accuracy. Interestingly, we observe that while some DNN layers' activation functions help DNNs' training optimization and achievable accuracy, they can be properly removed after training without compromising the model accuracy. Inspired by this observation, we propose a framework dubbed DepthShrinker, which develops hardware-friendly compact networks via shrinking the basic building blocks of existing efficient DNNs that feature irregular computation patterns into dense ones with much-improved hardware utilization and thus real-hardware efficiency even in general computing platforms such as GPUs. Excitingly, our DepthShrinker framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques, e.g., a 3.06% higher accuracy and 1.53× throughput on Tesla V100 over SOTA channel-wise pruning method MetaPruning.
Yingyan (Celine) Lin is an Assistant Professor in the Department of Electrical and Computer Engineering at Rice University. She leadsthe Efficient and Intelligent Computing (EIC) Lab at Rice, which focuses on developing cross-layer techniques from algorithm to algorithm-hardware co-design/search down to chip design and integration, to promote efficient machine learning systems towards green AI and ubiquitous machine learning-powered
She received a Ph.D. degree in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign (UIUC) in 2017, and has received an NSF CAREER Award, an IBM Faculty Award, and the ACM SIGDA Outstanding Young Faculty Award.