Football Table RL: Difference between revisions
Line 13: | Line 13: | ||
</p> | </p> | ||
<p> | <p> | ||
RBFs are a linear ''parametric function approximation'', where the Q-function is estimated using: | RBFs are a linear ''parametric function approximation'', where the Q-function is estimated using: | ||
</p> | |||
<math>Q = \theta\cdot\phi(s,a)^{T}</math> | <math>Q = \theta\cdot\phi(s,a)^{T}</math> | ||
<p> | |||
here <math>\phi(s,a)~(1 \times n)</math> denotes the so called feature vector, <math>\theta~(1 \times n)</math> denotes the parameter (or weight) vector. | |||
</p> | </p> | ||
Revision as of 16:08, 11 September 2013
Reinforcement Learning Basics
The football table employs on-line value iteration, namely Greedy-GQ[math]\displaystyle{ (\lambda) }[/math] and Approximate-Q[math]\displaystyle{ (\lambda) }[/math]. This page does not explain Reinforcement learning theory, it just touches on the usage and implementation of the provided library (libvfa
located on the SVN). Too get a basic understanding of Reinforcement Learning i suggest reading the book by Sutton & Barto [1]. For a slightly more in-depth, hands-on, book on using RL with function approximation, i suggest the book by Lucian Busoniu (TU Delft) et al. [2], which is freely available as e-book from within the TU/e network. Unfortunately both books very different notations, here we use the notation from the former.
From here on out this section assumes familiarity with reinforcement learning and discusses how the provided libvfa
library works and is used. Note, this can be used on any project provided the user provides transition samples correctly.
Value function approximation
For this project it was chosen to use a Gaussian Radial Basis Function Network(RBFs) to approximate the Q-function, a short explanation on the method can be found here in [3]. Currently, this is the only method of function approximation available in the library, however other forms of linear parametric function approximation should be easy to incorporate (this requires to only change the membership function).
RBFs are a linear parametric function approximation, where the Q-function is estimated using:
[math]\displaystyle{ Q = \theta\cdot\phi(s,a)^{T} }[/math]
here [math]\displaystyle{ \phi(s,a)~(1 \times n) }[/math] denotes the so called feature vector, [math]\displaystyle{ \theta~(1 \times n) }[/math] denotes the parameter (or weight) vector.
Local updates
As this involves local function approximation, it suffices to only update local nodes.
- ↑ Reinforcement Learning: an introduction, Sutton & Barto, http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
- ↑ Reinforcement Learning and Dynamic Programming using Function Approximation, Lucian Busoniu , Robert Babuska , Bart De Schutter & Damien Ernst, 2010, http://www.crcnetbase.com/isbn/9781439821091
- ↑ Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning, R.M. Kretchmar && C. W. Anderson, http://www.cs.colostate.edu/pubserv/pubs/Kretchmar-anderson-res-rl-matt-icnn97.pdf