Accelerating chemical similarity search using GPUs and metric embeddings [electronic resource]
- Imran Saeedul Haque.
- Physical description
- 1 online resource.
Also available at
At the library
All items must be viewed on site
Request items at least 2 days before you visit to allow retrieval from off-site storage. You can request at most 5 items per day.
|3781 2011 H||In-library use|
- Fifteen years ago, the advent of modern high-throughput sequencing revolutionized computational genetics with a flood of data. Today, high-throughput biochemical assays promise to make biochemistry the next data-rich domain for machine learning. However, existing computational methods, built for small analyses of about 1,000 molecules, do not scale to emerging multi-million molecule datasets. For many algorithms, pairwise similarity comparisons between molecules are a critical bottleneck, presenting a 1,000x-1,000,000x scaling barrier. In this dissertation, I describe the design of SIML and PAPER, our GPU implementations of 2D and 3D chemical similarities, as well as SCISSORS, our metric embedding algorithm. On a model problem of interest, combining these techniques allows up to 274,000x speedup in time and up to 2.8 million-fold reduction in space while retaining excellent accuracy. I further discuss how these high-speed techniques have allowed insight into chemical shape similarity and the behavior of machine learning kernel methods in the presence of noise.
- Publication date
- Submitted to the Department of Computer Science.
- Thesis (Ph.D.)--Stanford University, 2011.
Browse related items
Start at call number: