Embedding table sharding is a significant design challenge in the distributed training of deep recommendation models. Optimizing embedding table sharding can greatly boost the training throughput since embedding computation and communication are often the bottlenecks. Researchers and practitioners who work on efficiency problems in recommendation models would find DreamShard interesting and useful.
This work also provides a concrete example of how RL can be used to improve machine learning system design. The idea of training neural cost models and reinforcement learning could be applied to many combinatorial optimization problems in the system design.
Read the full paper
Thanks to the many people who provided technical insights, discussions, and feedback: Dhruv Choudhary, Chris Cummins, Xizhou Feng, Aaron Ferber, Yuchen Hao, Pavani Panakanti, Soohee Lee, Zhongyi Lin, Zirui Liu, Geet Sethi, Srinivas Sridharan, Zhou Wang, Justin Wong, Carole-Jean Wu, and Yufei Zhu.
We also deeply appreciate the support from our leadership team: Leo Cazares, Binu John, Sukwoo Kang, Richard Kaufmann, Arun Kejariwal, Max Leung, Parth Malani, Martin Patterson, and Rishi Sinha.