文档详情

《2016 Learning Rates for Q-learning》.pdf

发布:2015-10-16约11.72万字共25页下载文档
文本预览下载声明
Journal of Machine Learning Research 5 (2003) 1-25 Submitted 4/02; Revised 9/02; Published 12/03 Learning Rates for Q-learning Eyal Even-Dar EVEND @CS .TAU .AC .IL Yishay Mansour MANSOUR @CS .TAU .AC .IL School of Computer Science Tel-Aviv University Tel-Aviv, 69978, Israel Editor: Peter Bartlett Abstract In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1tω at time t where ω 12 1, we show that the convergence rate is poly- nomial in 11 −γ, where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1t at time t, the convergence rate has an exponential dependence on 11 −γ. In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates. Keywords: Reinforcement Learning, Q-Learning, Stochastic Processes, Convergence Bounds, Learning Rates. 1. Introduction In Reinforcement Learning, an agent wanders in an unknown environment and tries to maximize its long term return by performing actions and receiving rewards. The challenge is to understand how a current action will affect future rewards. A good way to model this task is with Markov Decision Processes (MDP), which have become the dominant approach in Reinforcement Learning (Sutton and Barto, 1998, Bertsekas and Tsitsiklis, 1996). An MDP includes states (which abstract the environment), actions (which are the available actions to the agent), and for each state-action pair, a distribution of
显示全部
相似文档