PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Shen Li, Yanli Zhao, Rohan Verma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala
People
Software Engineer
I am a Software Engineer on the PyTorch Distributed team in Facebook AI Infrastructure, where I focus on large-scale distributed training for a variety of models from ranking to CV and NLP. Specifically, I am building data-, model-, and hybrid-parallel techniques in PyTorch to support training complex models across both commodity and high-performance computing clusters.
Prior to Facebook, I studied Electrical Engineering and Computer Science at UC Berkeley, where I worked on a distributed data analytics library based on Ray at the RISELab.
Machine learning, distributed systems
Shen Li, Yanli Zhao, Rohan Verma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala