Reinforcement Learning with Recurrence: Application to Games and Finance
John Moody
Algorithms Group
International Computer Science Institute, Berkeley
Wednesday, November 9, 2005
4:30 - 5:45 PM
Terman Engineering Center, Room 453
Abstract:
The dominant approach to RL over the past 20 years has been based on
Markov decision processes and dynamic programming, whereby RL agents
learn an abstract value function (VF). An alternative approach,
direct reinforcement (DR), traces its origins to control engineering,
and has recently been revisited. In contrast to VF methods, DR agents
learn policies directly without the need to learn a value function.
I present Stochastic Direct Reinforcement (SDR), a policy gradient
algorithm for learning recurrent policies with discrete actions. For
many problems of real world interest, I argue that non-Markovian
policy gradient algorithms such as SDR can enable simpler, more
natural problem representations, offer advantages in learning
efficiency and discover better policies. Empirical illustrations of
the recurrent DR approach include competitive and cooperative games
and trading financial markets with risk aversion and transaction
costs.
This talk includes joint work with Matthew Saffell, Yufeng Liu and
Kyoungju Youn.
Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html