Efficient reinforcement learning with value function generalization [electronic resource]
- Zheng Wen.
- Physical description
- 1 online resource.
Also available at
At the library
Limited on-site access
Researchers in the Stanford community can request to view these materials in the Special Collections Reading Room. Entry to the Reading Room is by appointment only.
|3781 2014 W||In-library use|
- Reinforcement learning (RL) is concerned with how an agent should learn to make decisions over time while interacting with an environment. A growing body of work has produced RL algorithms with sample and computational efficiency guarantees. However, most of this work focuses on "tabula rasa" learning; i.e. algorithms aim to learn with little or no prior knowledge about the environment. Such algorithms exhibit sample complexities that grow at least linearly in the number of states, and they are of limited practical import since state spaces in most relevant contexts are enormous. There is a need for algorithms that generalize in order to learn how to make effective decisions at states beyond the scope of past experience. This dissertation focuses on the open issue of developing efficient RL algorithms that leverage value function generalization (VFG). It consists of two parts. In the first part, we present sample complexity results for two classes of RL problems -- deterministic systems with general forms of VFG and Markov decision processes (MDPs) with a finite hypothesis class. The results provide upper bounds that are independent of state and action space cardinalities and polynomial in other problem parameters. In the second part, building on insights from our sample complexity analyses, we propose randomized least-square value iteration (RLSVI), a RL algorithm for MDPs with VFG via linear hypothesis classes. The algorithm is based on a new notion of randomized value function exploration. We compare through computational studies the performance of RLSVI against least-square value iterations (LSVI) with Boltzmann exploration or epsilon-greedy exploration, which are widely used in RL with VFG. Results demonstrate that RLSVI is orders of magnitude more efficient.
- Publication date
- Submitted to the Department of Electrical Engineering.
- Ph.D. Stanford University 2014
Browse related items
Start at call number: