On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision for decades. Recently, researchers have come to realize that a large amount of image data is the key to several major breakthroughs in image recognition, as exemplified by face detection and deep feature learning. However, while an image is a 2D array, the world is 3D and it is not possible to bypass 3D reasoning during scene understanding.
In this talk, I will advocate the use of big 3D data in all major steps of scene understanding. I will share my experience on how to use big 3D data for bottom-up object detection, top-down context reasoning, 3D feature learning and shape representation. As examples, I will present three of our recent works to demonstrate the power of big 3D data: Sliding Shapes -- a 3D object detector trained from a large amount of depth maps rendered from CAD models, PanoContext -- a data-driven non-parametric context model for panoramic scene parsing, and 3D ShapeNets -- a Convolutional Deep Belief Network learned from CAD models on the Internet. Finally, I will discuss several remaining open challenges for big 3D data.