### Librarian View

Last updated in SearchWorks on August 4, 2024 3:30am## Metadata

LEADER 05970nam a22003973i 4500

001
a13520369

003
SIRSI

005
20240417015614.5

006
m d

007
cr un

008
200323t20202020cau om 000 0 eng d

035

a| (Sirsi) dorfg022dx0979

040

a| CSt
b| eng
e| rda
c| CSt
d| UtOrBLW

100

1

a| Choy, Christopher Bongsoo,
e| author.

245

1

0

a| High-dimensional convolutional neural networks for 3D perception /
c| Christopher B. Choy

264

1

a| [Stanford, California] :
b| [Stanford University],
c| 2020

264

4

c| ©2020

300

a| 1 online resource

336

a| text
2| rdacontent

337

a| computer
2| rdamedia

338

a| online resource
2| rdacarrier

500

a| Submitted to the Department of Electrical Engineering

502

g| Thesis
b| Ph.D.
c| Stanford University
d| 2020

520

3

a| The automation of mechanical tasks brought the modern world unprecedented prosperity and comfort. However, the majority of automated tasks have been simple mechanical tasks that only require repetitive motion. Tasks that require visual perception and high-level cognition still have become the last frontiers of automation. Many of these tasks require visual perception such as automated warehouses where robots package items in disarray, autonomous driving where autonomous agents localize themselves, identify and track other dynamic objects in the 3D world. This ability to represent, identify, and interpret visual three-dimensional data to understand the underlying three-dimensional structure in the real world is known as 3D perception. In this dissertation, we propose learning-based approaches to tackle challenges in 3D perception. Specifically, we propose a set of high-dimensional convolutional neural networks for three categories of problems in 3D perception: reconstruction, representation learning, and registration. Reconstruction is the first step that generates 3D point clouds or meshes from a set of sensory inputs. We present supervised reconstruction methods using 3D convolutional neural networks that take a set of images as input and generate 3D occupancy patterns in a grid as output. We train the networks with a large-scale 3D shape dataset to generate a set of images rendered from various viewpoints validate the approach on real image datasets. However, supervised reconstruction requires 3D shapes as labels for all images, which are expensive to generate. Instead, we propose using a set of foreground masks and unlabeled real 3D shapes to train the reconstruction network as weaker supervision. Combined with the learned constraint, we train the reconstruction system with as few as 1 image and show that the proposed model without direct 3D supervision. In the second part of the dissertation, we present sparse tensor networks, neural networks for spatially sparse tensors. As we increase the spatial dimension, the sparsity of input data decreases drastically as the volume of the space increases exponentially. Sparse tensor networks exploit such inherent sparsity in the input data and efficiently process them. With the sparse tensor network, we create a 4-dimensional convolutional network for spatio-temporal perception for 3D scans or a sequence of 3D scans (3D video). We show that 4-dimensional convolutional neural networks can effectively make use of temporal consistency and improve the accuracy of segmentation. Next, we use the sparse tensor networks for geometric representation learning to capture both local and global 3D structures accurately for correspondences and registration. We propose fully convolutional networks and new types of metric learning losses that allow neurons to capture large context while capturing local spatial geometry. We experimentally validate our approach on both indoor and outdoor datasets and show that the network outperforms the state-of-the-art method while being a few orders of magnitude faster. In the third and the last part of the dissertation, we discuss high-dimensional pattern recognition problems in image and 3D registration. We first propose high-dimensional convolutional networks from 4 to 32-dimensional spaces and analyze the geometric pattern recognition capacity of these high-dimensional convolutional networks for linear regression problems. Next, we show that the 3D correspondences form a hyper-surface in 6-dimensional space; and 2D correspondences form a 4-dimensional hyper-conic section, which we detect using high-dimensional convolutional networks. We extend the proposed high-dimensional convolutional networks for differentiable 3D registration and propose three core modules for this: a 6-dimensional convolutional neural network for correspondence confidence prediction; a differentiable Weighted Procrustes method for closed-form pose estimation; and a robust gradient-based 3D rigid transformation optimizer for pose refinement. Experiments demonstrate that our approach outperforms state-of-the-art learning-based and classical methods on real-world data while maintaining efficiency

700

1

a| Savarese, Silvio
e| degree supervisor.
4| ths
0| http://id.loc.gov/authorities/names/no2011143935

700

1

a| Guibas, Leonidas J.
e| degree committee member.
4| ths
0| http://id.loc.gov/authorities/names/n86109709

700

1

a| Wetzstein, Gordon
e| degree committee member.
4| ths
0| http://id.loc.gov/authorities/names/no2015078928

710

2

a| Stanford University.
b| Department of Electrical Engineering.
0| http://id.loc.gov/authorities/names/nr2002030762

596

a| 22

035

a| (OCoLC-M)1146047095

035

a| (Sirsi) dorfg022dx0979

999

f

f

i| 6c52c7d4-5d68-5897-9751-a9c6117d168c
s| 3fc10bdd-b705-5485-8233-3ad4bf9686b8

856

4

0

u| https://purl.stanford.edu/fg022dx0979
x| SDR-PURL
x| item
x| rights:world

## Holdings JSON

{ "holdings": [ { "id": "c77ed89a-3016-554d-9304-f61dacbf9aa5", "hrid": "ah13520369_1", "notes": [ ], "_version": 1, "metadata": { "createdDate": "2023-08-21T20:13:10.850Z", "updatedDate": "2023-08-21T20:13:10.850Z", "createdByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766", "updatedByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766" }, "sourceId": "f32d531e-df79-46b3-8932-cdd35f7a2264", "boundWith": null, "formerIds": [ ], "illPolicy": null, "instanceId": "6c52c7d4-5d68-5897-9751-a9c6117d168c", "holdingsType": { "id": "996f93e2-5b5e-4cf2-9168-33ced1f95eed", "name": "Electronic", "source": "folio" }, "holdingsItems": [ ], "callNumberType": null, "holdingsTypeId": "996f93e2-5b5e-4cf2-9168-33ced1f95eed", "electronicAccess": [ ], "bareHoldingsItems": [ ], "holdingsStatements": [ ], "statisticalCodeIds": [ ], "administrativeNotes": [ ], "effectiveLocationId": "1b14e21c-8d47-45c7-bc49-456a0086422b", "permanentLocationId": "1b14e21c-8d47-45c7-bc49-456a0086422b", "suppressFromDiscovery": false, "holdingsStatementsForIndexes": [ ], "holdingsStatementsForSupplements": [ ], "location": { "effectiveLocation": { "id": "1b14e21c-8d47-45c7-bc49-456a0086422b", "code": "SUL-SDR", "name": "Stanford Digital Repository", "campus": { "id": "c365047a-51f2-45ce-8601-e421ca3615c5", "code": "SUL", "name": "Stanford Libraries" }, "details": { }, "library": { "id": "c1a86906-ced0-46cb-8f5b-8cef542bdd00", "code": "SUL", "name": "SUL" }, "isActive": true, "institution": { "id": "8d433cdd-4e8f-4dc1-aa24-8a4ddb7dc929", "code": "SU", "name": "Stanford University" } }, "permanentLocation": { "id": "1b14e21c-8d47-45c7-bc49-456a0086422b", "code": "SUL-SDR", "name": "Stanford Digital Repository", "campus": { "id": "c365047a-51f2-45ce-8601-e421ca3615c5", "code": "SUL", "name": "Stanford Libraries" }, "details": { }, "library": { "id": "c1a86906-ced0-46cb-8f5b-8cef542bdd00", "code": "SUL", "name": "SUL" }, "isActive": true, "institution": { "id": "8d433cdd-4e8f-4dc1-aa24-8a4ddb7dc929", "code": "SU", "name": "Stanford University" } } } } ], "items": [ ] }

## FOLIO JSON

{ "pieces": [ null ], "instance": { "id": "6c52c7d4-5d68-5897-9751-a9c6117d168c", "hrid": "a13520369", "tags": { "tagList": [ ] }, "notes": [ { "note": "Submitted to the Department of Electrical Engineering", "staffOnly": false, "instanceNoteTypeId": "6a2533a7-4de2-4e64-8466-074c2fa9308c" }, { "note": "Thesis Ph.D. Stanford University 2020", "staffOnly": false, "instanceNoteTypeId": "b73cc9c2-c9fa-49aa-964f-5ae1aa754ecd" }, { "note": "The automation of mechanical tasks brought the modern world unprecedented prosperity and comfort. However, the majority of automated tasks have been simple mechanical tasks that only require repetitive motion. Tasks that require visual perception and high-level cognition still have become the last frontiers of automation. Many of these tasks require visual perception such as automated warehouses where robots package items in disarray, autonomous driving where autonomous agents localize themselves, identify and track other dynamic objects in the 3D world. This ability to represent, identify, and interpret visual three-dimensional data to understand the underlying three-dimensional structure in the real world is known as 3D perception. In this dissertation, we propose learning-based approaches to tackle challenges in 3D perception. Specifically, we propose a set of high-dimensional convolutional neural networks for three categories of problems in 3D perception: reconstruction, representation learning, and registration. Reconstruction is the first step that generates 3D point clouds or meshes from a set of sensory inputs. We present supervised reconstruction methods using 3D convolutional neural networks that take a set of images as input and generate 3D occupancy patterns in a grid as output. We train the networks with a large-scale 3D shape dataset to generate a set of images rendered from various viewpoints validate the approach on real image datasets. However, supervised reconstruction requires 3D shapes as labels for all images, which are expensive to generate. Instead, we propose using a set of foreground masks and unlabeled real 3D shapes to train the reconstruction network as weaker supervision. Combined with the learned constraint, we train the reconstruction system with as few as 1 image and show that the proposed model without direct 3D supervision. In the second part of the dissertation, we present sparse tensor networks, neural networks for spatially sparse tensors. As we increase the spatial dimension, the sparsity of input data decreases drastically as the volume of the space increases exponentially. Sparse tensor networks exploit such inherent sparsity in the input data and efficiently process them. With the sparse tensor network, we create a 4-dimensional convolutional network for spatio-temporal perception for 3D scans or a sequence of 3D scans (3D video). We show that 4-dimensional convolutional neural networks can effectively make use of temporal consistency and improve the accuracy of segmentation. Next, we use the sparse tensor networks for geometric representation learning to capture both local and global 3D structures accurately for correspondences and registration. We propose fully convolutional networks and new types of metric learning losses that allow neurons to capture large context while capturing local spatial geometry. We experimentally validate our approach on both indoor and outdoor datasets and show that the network outperforms the state-of-the-art method while being a few orders of magnitude faster. In the third and the last part of the dissertation, we discuss high-dimensional pattern recognition problems in image and 3D registration. We first propose high-dimensional convolutional networks from 4 to 32-dimensional spaces and analyze the geometric pattern recognition capacity of these high-dimensional convolutional networks for linear regression problems. Next, we show that the 3D correspondences form a hyper-surface in 6-dimensional space; and 2D correspondences form a 4-dimensional hyper-conic section, which we detect using high-dimensional convolutional networks. We extend the proposed high-dimensional convolutional networks for differentiable 3D registration and propose three core modules for this: a 6-dimensional convolutional neural network for correspondence confidence prediction; a differentiable Weighted Procrustes method for closed-form pose estimation; and a robust gradient-based 3D rigid transformation optimizer for pose refinement. Experiments demonstrate that our approach outperforms state-of-the-art learning-based and classical methods on real-world data while maintaining efficiency", "staffOnly": false, "instanceNoteTypeId": "10e2e11b-450f-45c8-b09b-0f819999966e" } ], "title": "High-dimensional convolutional neural networks for 3D perception / Christopher B. Choy", "series": [ ], "source": "MARC", "_version": 3, "editions": [ ], "metadata": { "createdDate": "2023-08-21T20:10:00.953Z", "updatedDate": "2024-04-17T01:56:22.301Z", "createdByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766", "updatedByUserId": "709fdac6-d3f3-5784-8839-fe36ad6ed0b3" }, "statusId": "9634a5ab-9228-4703-baf2-4d12ebc77d56", "subjects": [ ], "languages": [ "eng" ], "indexTitle": "High-dimensional convolutional neural networks for 3D perception /", "identifiers": [ { "value": "(Sirsi) dorfg022dx0979", "identifierTypeId": "7e591197-f335-4afb-bc6d-a6d76ca3bace" }, { "value": "(OCoLC-M)1146047095", "identifierTypeId": "439bfbae-75bc-4f74-9fc7-b2a2d47ce3ef" }, { "value": "(Sirsi) dorfg022dx0979", "identifierTypeId": "7e591197-f335-4afb-bc6d-a6d76ca3bace" } ], "publication": [ { "role": "Publication", "place": "[Stanford, California]", "publisher": "[Stanford University]", "dateOfPublication": "2020" }, { "dateOfPublication": "©2020" } ], "contributors": [ { "name": "Choy, Christopher Bongsoo,", "primary": true, "contributorTypeText": "author.", "contributorNameTypeId": "2b94c631-fca9-4892-a730-03ee529ffe2a" }, { "name": "Savarese, Silvio", "primary": false, "contributorTypeId": "cce475f7-ccfa-4e15-adf8-39f907788515", "contributorTypeText": "degree supervisor.", "contributorNameTypeId": "2b94c631-fca9-4892-a730-03ee529ffe2a" }, { "name": "Guibas, Leonidas J", "primary": false, "contributorTypeId": "cce475f7-ccfa-4e15-adf8-39f907788515", "contributorTypeText": "degree committee member.", "contributorNameTypeId": "2b94c631-fca9-4892-a730-03ee529ffe2a" }, { "name": "Wetzstein, Gordon", "primary": false, "contributorTypeId": "cce475f7-ccfa-4e15-adf8-39f907788515", "contributorTypeText": "degree committee member.", "contributorNameTypeId": "2b94c631-fca9-4892-a730-03ee529ffe2a" }, { "name": "Stanford University. Department of Electrical Engineering", "primary": false, "contributorNameTypeId": "2e48e713-17f3-4c13-a9f8-23845bb210aa" } ], "catalogedDate": "2020-03-24", "staffSuppress": false, "instanceTypeId": "6312d172-f0cf-40f6-b27d-9fa8feaf332f", "previouslyHeld": false, "classifications": [ ], "instanceFormats": [ ], "electronicAccess": [ { "uri": "https://purl.stanford.edu/fg022dx0979", "name": "Resource", "relationshipId": "f5d0068e-6272-458e-8a81-b85e7b9a14aa" } ], "holdingsRecords2": [ ], "modeOfIssuanceId": "9d18a02f-5897-4c31-9106-c9abb5c7ae8b", "publicationRange": [ ], "statisticalCodes": [ ], "alternativeTitles": [ ], "discoverySuppress": false, "instanceFormatIds": [ ], "statusUpdatedDate": "2023-08-21T20:10:00.867+0000", "statisticalCodeIds": [ "0f328803-cd6a-47c0-8e76-f3a775d56884" ], "administrativeNotes": [ ], "physicalDescriptions": [ "1 online resource" ], "publicationFrequency": [ ], "suppressFromDiscovery": false, "natureOfContentTermIds": [ ] }, "holdingSummaries": [ { "poLineId": null, "orderType": null, "orderStatus": null, "poLineNumber": null, "orderSentDate": null, "orderCloseReason": null, "polReceiptStatus": null } ] }