
A paper published by students and faculty from the Chandra Family Department of Electrical Engineering in collaboration with Meta has received an Outstanding Paper Honorable Mention at MLSys 2025, the premier conference on machine learning and systems.
The paper, "APOLLO: SGD-like Memory, AdamW-level Performance," was co-authored by Hanqing Zhu* (UT), Zhenyu Zhang* (UT), Wenyan Cong (UT), Xi Liu (Meta), Sem Park (Meta), Vikas Chandra (Meta), Bo Long (Meta), Prof. David Pan (UT), Prof. Atlas Wang (UT), and Jinwon Lee (Meta).
In this paper, the authors propose APOLLO, a memory-efficient optimizer for training large language models. It achieves SGD-level memory usage while maintaining AdamW-level performance. Their method offers significant system improvements, including reduced memory overhead and increased training throughput. It has also been adopted by several popular open-source platforms, including HuggingFace and LLaMA-Factory.