Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
IEEE Micro - Special Issue on Commercial Products
We present a system to collectively optimize efficiency in a very large scale deployment of GPU servers for machine learning workloads at Facebook. Our system (a) measures and stores system-wide efficiency metrics for every executed workflow, (b) aggregates data from across the execution stack to identify optimization opportunities that maximize fleet-wide efficiency improvements, (c) provides periodic and on-demand whole-system profiling for workflows and, (d) automatically analyzes traces for common anti-patterns. We present each component of the stack and show case studies demonstrating the use of the tools to significantly improve performance. To our knowledge, our system is the most complete and effective solution for identifying and addressing efficiency problems in datacenter-scale GPU deployments.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann