Abstract: Modern deep learning methods, most prominently language models, have achieved tremendous empirical success, yet a theoretical understanding of how neural networks learn from data remains incomplete. While reasoning directly about these approaches is often intractable, formalizing core empirical phenomena through minimal “sandbox” tasks offers a promising path toward principled theory. In this talk, I demonstrate how proving end-to-end learning guarantees for such tasks yields a practical understanding of how the network architecture, optimization algorithm, and data distribution jointly give rise to key behaviors. First, I will show how neural scaling laws arise from the dynamics of stochastic gradient descent in shallow neural networks. Next, I will study how and under what conditions transformers trained via gradient descent can learn different reasoning behaviors, including in-context learning and multi-step reasoning. Altogether, this approach builds theories that provide concrete insight into the behavior of modern AI systems.
Bio: Eshaan is a final-year Ph.D. student in the Electrical and Computer Engineering (ECE) department at Princeton University, jointly advised by Jason D. Lee and Yuxin Chen. His research focuses on the theory of deep learning, ranging from characterizing the fundamental limits of shallow neural networks to understanding how LLM phenomena emerge during training. He is a recipient of the IBM PhD Fellowship and the NDSEG Fellowship, and was selected as a 2025 Rising Star in Data Science.