《2016 Learning Rates for Q-learning》.pdf
文本预览下载声明
Journal of Machine Learning Research 5 (2003) 1-25 Submitted 4/02; Revised 9/02; Published 12/03
Learning Rates for Q-learning
Eyal Even-Dar EVEND @CS .TAU .AC .IL
Yishay Mansour MANSOUR @CS .TAU .AC .IL
School of Computer Science
Tel-Aviv University
Tel-Aviv, 69978, Israel
Editor: Peter Bartlett
Abstract
In this paper we derive convergence rates for Q-learning. We show an interesting relationship
between the convergence rate and the learning rate used in Q-learning. For a polynomial learning
rate, one which is 1tω at time t where ω 12 1, we show that the convergence rate is poly-
nomial in 11 −γ, where γ is the discount factor. In contrast we show that for a linear learning
rate, one which is 1t at time t, the convergence rate has an exponential dependence on 11 −γ.
In addition we show a simple example that proves this exponential behavior is inherent for linear
learning rates.
Keywords: Reinforcement Learning, Q-Learning, Stochastic Processes, Convergence Bounds,
Learning Rates.
1. Introduction
In Reinforcement Learning, an agent wanders in an unknown environment and tries to maximize its
long term return by performing actions and receiving rewards. The challenge is to understand how
a current action will affect future rewards. A good way to model this task is with Markov Decision
Processes (MDP), which have become the dominant approach in Reinforcement Learning (Sutton
and Barto, 1998, Bertsekas and Tsitsiklis, 1996).
An MDP includes states (which abstract the environment), actions (which are the available
actions to the agent), and for each state-action pair, a distribution of
显示全部