Algorithms and lower bounds for parameter-free online learning
- Ashok Cutkosky.
- [Stanford, California] : [Stanford University], 2018.
- Copyright notice
- Physical description
- 1 online resource.
Also available at
At the library
All items must be viewed on site
Request items at least 2 days before you visit to allow retrieval from off-site storage. You can request at most 5 items per day.
|3781 2018 C||In-library use|
- Cutkosky, Ashok, author.
- Boahen, Kwabena (Kwabena Adu), degree supervisor.
- Liang, Percy, degree supervisor.
- Boneh, Dan, 1969- degree committee member.
- Duchi, John, degree committee member.
- Sidford, Aaron, degree committee member.
- Stanford University. Computer Science Department.
- ["Training a machine learning model today involves minimizing a loss function on datasets that are often gigantic, and so almost all practically relevant training algorithms operate in an online manner by reading in small chunks of the data at a time and making updates to the model on-the-fly. As a result, online learning, a popular way to analyze optimization algorithms operating on datastreams, is at the heart of modern machine learning pipelines. In order to converge to the optimal model as quickly as possible, online learning algorithms all require some user-specified parameters that reflect the shape of the loss or statistics of the input data. Examples of such parameters include the size of the gradients of the losses, the distance from some initial model to the optimal model, and the amount of variance in the data, among others. Since the true values for these parameters are often unknown, the practical implementation of online learning algorithms usually involves simply guessing (called ``tuning''), which is both inefficient and inelegant. This motivates the search for parameter-free algorithms that can adapt to these unknown values. Prior algorithms have achieved adaptivity to many different unknown parameters individually - for example one may adapt to an unknown gradient sizes given a known distance to the optimal model, or adapt to the unknown distance given a known bound on gradient size. However, no algorithm could adapt to both parameters simultaneously. This work introduces new lower bounds, algorithms, and analysis techniques for adapting to many parameters at once. We begin by proving a lower bound showing that adapting to both the size of the gradients and distance to optimal model simultaneously is fundamentally much harder than adapting to either individually, and proceed to develop the first algorithm to meet this lower bound, obtaining optimal adaptivity to both parameters at once. We then expand upon this result to design algorithms that adapt to more unknown parameters, including the variance of the data, different methods for measuring distances, and upper or lower bounds on the second derivative of the loss. We obtain these results by developing new techniques that convert non-parameter-free optimization algorithms into parameter-free algorithms. In addition to providing new and more adaptive algorithms, the relative simplicity of non-parameter-free algorithms allows these techniques to significantly reduce the complexity of many prior analyses."]
- Publication date
- Copyright date
- Submitted to the Department of Computer Science.
- Thesis Ph.D. Stanford University 2018.
Browse related items
Start at call number: