Fifteen years ago, the advent of modern high-throughput sequencing revolutionized computational genetics with a flood of data. Today, high-throughput biochemical assays promise to make biochemistry the next data-rich domain for machine learning. However, existing computational methods, built for small analyses of about 1,000 molecules, do not scale to emerging multi-million molecule datasets. For many algorithms, pairwise similarity comparisons between molecules are a critical bottleneck, presenting a 1,000x-1,000,000x scaling barrier. In this dissertation, I describe the design of SIML and PAPER, our GPU implementations of 2D and 3D chemical similarities, as well as SCISSORS, our metric embedding algorithm. On a model problem of interest, combining these techniques allows up to 274,000x speedup in time and up to 2.8 million-fold reduction in space while retaining excellent accuracy. I further discuss how these high-speed techniques have allowed insight into chemical shape similarity and the behavior of machine learning kernel methods in the presence of noise.