The Art of Sequential Optimization via Simulations
Rahul Jain
USC Department of Electrical Engineering
Wednesday, Oct.8, 2014
4:15 - 5:15 PM
Huang 305
Abstract:
I will talk about a natural framework for simulation-based optimization and control of Markov decision
process (MDP) models. The idea is very simple: Replace the Bellman operator by its `empirical’ variant wherein Expectation
is replaced by a sample average approximation. This leads to a random Bellman operator in the dynamic programming
equations. We introduce several notions of probabilistic fixed points of such random operators, and show their asymptotic equivalence.
We establish convergence of empirical Value and Policy Iteration algorithms by a stochastic dominance argument. The mathematical
technique introduced is useful for analyzing other iterated random operators (than just the empirical Bellman operator), and may also
be useful in random matrix theory. The idea can be generalized to asynchronous dynamic programming, and is also useful for computing equilibria of zero-sum stochastic games. Preliminary numerical results show better convergence rate and runtime performance than stochastic approximation/reinforcement learning and or any other commonly used schemes.
This is joint work with Dileep Kalathil (UC Berkeley) and William Haskell (NUS).
Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html