Teaching Computers to See Using Big 3D Data

Seminar
Wednesday, March 04, 2015
5:00 PM to 6:00 PM
POB 2.402
Free and open to the public

On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision for decades. Recently, researchers have come to realize that a large amount of image data is the key to several major breakthroughs in image recognition, as exemplified by face detection and deep feature learning. However, while an image is a 2D array, the world is 3D and it is not possible to bypass 3D reasoning during scene understanding.

In this talk, I will advocate the use of big 3D data in all major steps of scene understanding. I will share my experience on how to use big 3D data for bottom-up object detection, top-down context reasoning, 3D feature learning and shape representation. As examples, I will present three of our recent works to demonstrate the power of big 3D data: Sliding Shapes -- a 3D object detector trained from a large amount of depth maps rendered from CAD models, PanoContext -- a data-driven non-parametric context model for panoramic scene parsing, and 3D ShapeNets -- a Convolutional Deep Belief Network learned from CAD models on the Internet. Finally, I will discuss several remaining open challenges for big 3D data.

x x

Speaker

Jianxiong Xiao

Jianxiong Xiao

Assistant Professor
Princeton University

Jianxiong Xiao is an Assistant Professor in the Department of Computer Science at Princeton University. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). His research interests are in computer vision, with a focus on data-driven scene understanding. He has been motivated by the goal of building computer systems that automatically understand visual scenes, both inferring the semantics (e.g. SUN Database) and extracting 3D structure (e.g. Big Museum). His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012, and has appeared in popular press in the United States. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and Google Research Awards in 2014.