Stimulus -> Action <-> Reward Network (SARN)

by Ondrej Pacovsky

On-line reinforcement learning algorithm for real-time application with real-valued inputs.
Prototype implementation was done in Unreal 2004 environment. A SARN-controlled bot learns to evade missiles during the game.

Screenshot: rocket shot on the SARN bot

The thesis

Online learning in Real-time environments

Abstract

In this work, a novel reinforcement learning algorithm, SARN, is developed. It is targeted for application in real-time
domains where the inputs are usually continuous and adaptation must proceed on-line, without separate training periods. Another objective is to minimise the amount of problem-specific teacher (human) input needed for successful application of the algorithm. The SARN architecture combines a connectionist network and scalar reinforcement feedback by employing Hebbian principles. By
adapting the network weights, connections are established between stimuli and actions that lead to positive feedback. Since the links between the input stimuli and the actions are formed quite rapidly, it is possible to use a large number of stimuli. This leads to the idea of using recurrent random network (Echo State Network) as a pre-processing layer. Prototype implementation is tested in Unreal 2004 game environment. The comparison with Q-learning shows that on the time scale of tens of seconds to minutes, SARN typically achieves better performance. When coupled with an Echo State Network, SARN requires a uniquely low amount of problem-specific information supplied by the teacher. These features make SARN useful for domains such as autonomous robot control and game AI.
Full text - PDF

Movies

In low quality for now. To be updated soon.
All done with SARN-80 controller (=SARN coupled with Echo State Network with 80 nodes)

Source Code

The source code is available here. Feel free to have a look, but compiling/running will be tricky as I only tested it in my environment.
The high-level programmer doc is contained in the thesis (slightly outdated) and you can also look at the auto-generated doxygen pages.

[back to top]