Prof. Atlas Wang and his research team have received a Science of Trustworthy AI grant from Schmidt Sciences for their work on “Unlearning Sensitive Knowledge in Reasoning Models: From Final Answers to Reasoning Traces.”
The Schmidt Sciences Science of Trustworthy AI program supports technical research that improves our ability to understand, predict, and control risks from frontier AI systems while enabling their trustworthy deployment.
The project aims to address the “unlearning gap” caused by sensitive information that may appear inside large AI models by developing methods to remove unsafe content not only from final outputs but also from the model’s reasoning process itself. The team hopes to build safer, more trustworthy AI systems by ensuring that both what models say and how they think remain aligned with safety requirements.”
The project is led by Prof. Sijia Liu of Michigan State University along with Prof. Wang and Prof. Huan Zhang of the University of Illinois Urbana-Champaign.
“Trustworthy AI is essential because modern AI systems increasingly act as decision-making infrastructure in high-stakes domains such as finance, autonomy, and critical logistics, where reliability, interpretability, and verifiable behavior are necessary to ensure that learned models remain aligned with real-world constraints and do not silently fail under distribution shifts or adversarial conditions,” said Wang.
In the future, Wang plans to develop principled AI architectures that combine learning with structured reasoning and formal constraints, enabling systems whose behavior can be audited, stress-tested, and governed through explicit mechanisms rather than relying solely on empirical performance.