The Art of Sequential Optimization via Simulations


Rahul Jain
USC Department of Electrical Engineering

Wednesday, Oct.8, 2014
4:15 - 5:15 PM
Huang 305


Abstract:

I will talk about a natural framework for simulation-based optimization and control of Markov decision process (MDP) models. The idea is very simple: Replace the Bellman operator by its `empirical’ variant wherein Expectation is replaced by a sample average approximation. This leads to a random Bellman operator in the dynamic programming equations. We introduce several notions of probabilistic fixed points of such random operators, and show their asymptotic equivalence. We establish convergence of empirical Value and Policy Iteration algorithms by a stochastic dominance argument. The mathematical technique introduced is useful for analyzing other iterated random operators (than just the empirical Bellman operator), and may also be useful in random matrix theory. The idea can be generalized to asynchronous dynamic programming, and is also useful for computing equilibria of zero-sum stochastic games. Preliminary numerical results show better convergence rate and runtime performance than stochastic approximation/reinforcement learning and or any other commonly used schemes. This is joint work with Dileep Kalathil (UC Berkeley) and William Haskell (NUS).




Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html