Resilience at Scale: Insights and Opportunities

Seminar
Tuesday, November 10, 2015
3:30 PM to 4:30 PM
POB 2.402
Free and open to the public

Designing warehouse-scale computers to be resilient and reliable is a challenge. Technology scaling, large component counts, and the use of emerging technologies pose serious reliability challenges. Solutions to address these problems need to be efficient and cost-effective. This talk will use insights gained from field data analyses of supercomputers and cloud data centers to highlight certain reliability issues in warehouse-scale computers, from the point of view of server designers as well as the operator of a data center. Opportunities for innovation will be discussed. 

x x

Speaker

Sudhanva Gurumurthi

Sudhanva Gurumurthi

Senior Data Center Engineer
IBM Cloud Innovation Lab

Sudhanva Gurumurthi is a Senior Data Center Engineer at the IBM Cloud Innovation Lab and a Visiting Associate Professor at the University of Virginia. He used to be a Senior Researcher at AMD and a tenured Associate Professor at the University Virginia. Sudhanva's areas of interest are computer architecture and cloud computing. He is a recipient of the NSF CAREER award. He received his PhD from Penn State and his Bachelors degree from the College of Engineering Guindy, Anna University, India. He is a Senior Member of the IEEE and the ACM.