Skip to main content

Building Capable Models with Adaptive Supervision

ECE Seminar

-
Location: EER 3.646
Speaker:
Abhishek Panigrahi
Princeton University

Abstract: Training capable small language models is a central challenge, yet existing distillation 
methods treat teachers as static supervision sources. I argue that effective learning depends on 
how a small model learns from a bigger one and when it learns it. I show that intermediate 
teacher checkpoints reveal implicit learning trajectories, and that aligning students to these 
trajectories yields provable sample-complexity benefits. Building on this, I develop GRACES, 
which predicts teacher-student compatibility from gradients, and STAT, which adapts 
supervision to student weakness. These principles extend beyond distillation to contextenhanced learning using privileged information and progressive random training. I outline a 
vision for autonomous supervision systems that adapt to learner characteristics without manual 
curriculum design and the challenges that remain.


Bio:
Abhishek is a final year graduate student in the Computer Science department at Princeton 
University, advised by Prof. Sanjeev Arora. His research focuses on understanding and 
improving generalization in deep learning models, with an emphasis on principled training 
algorithms that offer theoretical or interpretable guarantees. He is an Apple AI/ML and Siebel 
Scholar for the year 2025-26. Prior to the PhD, he was a resident at Microsoft Research 
India Lab and studied computer science as an undergraduate at IIT Kharagpur.