Efficient reinforcement learning with value function generalization [electronic resource]
- Responsibility
- Zheng Wen.
- Imprint
- 2014.
- Physical description
- 1 online resource.
Digital content
Also available at
At the library

Special Collections
On-site access
Collections are moving, which may affect access. Request materials as early as possible. Maximum 5 items per day. Contact specialcollections@stanford.edu for information about access.
Call number | Note | Status |
---|---|---|
3781 2014 W | In-library use |
More options
Description
Creators/Contributors
- Author/Creator
- Wen, Zheng, 1983-
- Contributor
- Van Roy, Benjamin primary advisor. Thesis advisor
- Boyd, Stephen P. advisor. Thesis advisor
- Johari, Ramesh, 1976- advisor. Thesis advisor
- O'Neill, Daniel, advisor. Thesis advisor
- Stanford University. Department of Electrical Engineering.
Contents/Summary
- Summary
- Reinforcement learning (RL) is concerned with how an agent should learn to make decisions over time while interacting with an environment. A growing body of work has produced RL algorithms with sample and computational efficiency guarantees. However, most of this work focuses on "tabula rasa" learning; i.e. algorithms aim to learn with little or no prior knowledge about the environment. Such algorithms exhibit sample complexities that grow at least linearly in the number of states, and they are of limited practical import since state spaces in most relevant contexts are enormous. There is a need for algorithms that generalize in order to learn how to make effective decisions at states beyond the scope of past experience. This dissertation focuses on the open issue of developing efficient RL algorithms that leverage value function generalization (VFG). It consists of two parts. In the first part, we present sample complexity results for two classes of RL problems -- deterministic systems with general forms of VFG and Markov decision processes (MDPs) with a finite hypothesis class. The results provide upper bounds that are independent of state and action space cardinalities and polynomial in other problem parameters. In the second part, building on insights from our sample complexity analyses, we propose randomized least-square value iteration (RLSVI), a RL algorithm for MDPs with VFG via linear hypothesis classes. The algorithm is based on a new notion of randomized value function exploration. We compare through computational studies the performance of RLSVI against least-square value iterations (LSVI) with Boltzmann exploration or epsilon-greedy exploration, which are widely used in RL with VFG. Results demonstrate that RLSVI is orders of magnitude more efficient.
Bibliographic information
- Publication date
- 2014
- Note
- Submitted to the Department of Electrical Engineering.
- Note
- Ph.D. Stanford University 2014