Deep object-centric 3D perception
- Li Yi.
- [Stanford, California] : [Stanford University], 2019.
- Copyright notice
- Physical description
- 1 online resource.
Also available at
- Yi, Li, author.
- Guibas, Leonidas J., degree supervisor.
- Girod, Bernd, degree committee member.
- Savarese, Silvio, degree committee member.
- Stanford University. Department of Electrical Engineering.
- Teaching machines to perceive visual content in a 3D environment as humans do is a central topic in Artificial Intelligence. The goal is to be able to process different types of 3D sensory inputs and generate symbolic or numerical descriptions about the environment to support decision making. In this thesis, we advocate an object-centric way to generate such descriptions, in which we represent an environment as a collection of 3D objects equipped with various attributes important for specific tasks. To generate such a representation, we focus on deep object-centric 3D perception, a class of approach built upon 3D deep learning techniques. This thesis covers three critical components of deep object-centric 3D perception: constructing large-scale 3D model repository, designing 3D deep learning frameworks to consume various formats of 3D data, applying big data and deep learning techniques to real perception tasks. We start by providing an overview of each component. Following this, we show how we could accelerate the labeling acquisition process to scale-up 3D model repositories so that data-hungry deep learning approaches can be applied. 3D data can usually be represented in different formats. Some of the prevalent geometric formats, such as point cloud and polygon mesh, poses a significant challenge to deep learning framework design since traditional deep nets designed for regular data forms, e.g., images, can not be directly applied. We then investigate how to build deep learning frameworks capable of consuming 3D shape meshes, an irregular graph-structured data format. Next, we provide two real perception applications as case studies, to show how big data and 3D deep learning help the field evolve. In particular, we study instance segmentation in 3D point cloud and develop a novel 3D object proposal network named GSPN as well as a 3D instance segmentation framework named R-PointNet, which boosts the state-of-the-art instance segmentation performance by a large margin on existing benchmarks. In the second application, we go one step further and tackle detailed part-level perception. We study the problem of articulation-based object part segmentation. We show how to modularize deep network design by disentangling complex perception problems into subproblems. We conclude by summarizing our efforts and discuss the challenges and open questions in the field.
- Publication date
- Copyright date
- Submitted to the Department of Electrical Engineering.
- Thesis Ph.D. Stanford University 2019.