Reinforcement Learning with Recurrence: Application to Games and Finance

John Moody
Algorithms Group
International Computer Science Institute, Berkeley


Wednesday, November 9, 2005
4:30 - 5:45 PM
Terman Engineering Center, Room 453


Abstract:

The dominant approach to RL over the past 20 years has been based on Markov decision processes and dynamic programming, whereby RL agents learn an abstract value function (VF). An alternative approach, direct reinforcement (DR), traces its origins to control engineering, and has recently been revisited. In contrast to VF methods, DR agents learn policies directly without the need to learn a value function.

I present Stochastic Direct Reinforcement (SDR), a policy gradient algorithm for learning recurrent policies with discrete actions. For many problems of real world interest, I argue that non-Markovian policy gradient algorithms such as SDR can enable simpler, more natural problem representations, offer advantages in learning efficiency and discover better policies. Empirical illustrations of the recurrent DR approach include competitive and cooperative games and trading financial markets with risk aversion and transaction costs.

This talk includes joint work with Matthew Saffell, Yufeng Liu and Kyoungju Youn.




Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html