Speeding up Distributed SGD via Communication-Efficient Model Aggregation

Seminar
Friday, March 06, 2020
11:00 AM to 12:00 PM
EER 1.516
Free and open to the public

Large-scale machine learning training, in particular, distributed stochastic gradient descent (SGD), needs to be robust to inherent system variability such as unpredictable computation and communication delays. This work considers a distributed SGD framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. Our goal is to analyze and improve the true speed of error convergence with respect to wall-clock time (instead of the number of iterations). For centralized model-averaging, we propose a strategy called AdaComm that gradually increases the model-averaging frequency in order to strike the best error-runtime trade-off. For decentralized model-averaging, we propose MATCHA, where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange and reduce communication delay. Experiments on training deep neural networks show that AdaComm and MATCHA can take 3x less time to achieve the same final training loss as compared to fully synchronous SGD and vanilla decentralized SGD respectively.

Based on joint work with Jianyu Wang, Anit Sahu, Zhouyi Yang, and Soummya Kar. Links to papers: https://arxiv.org/abs/1808.07576, https://arxiv.org/pdf/1810.08313.pdf, https://arxiv.org/abs/1905.09435

Speaker

Gauri Joshi

Carnegie Mellon University

Gauri Joshi is an assistant professor in the ECE department at Carnegie Mellon University since September 2017. Previously, she worked as a Research Staff Member at IBM T. J. Watson Research Center. Gauri completed her Ph.D. from MIT EECS in June 2016, advised by Prof. Gregory Wornell. She received her B.Tech and M.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2010. Her awards and honors include the NSF CRII Award (2018), IBM Faculty Research Award (2017), Best Thesis Prize in Computer science at MIT (2012), Institute Gold Medal of IIT Bombay (2010), and the Claude Shannon Research Assistantship (2015-16).