From Reinforcement Learning to Stochastic Optimization: A Universal Framework for Sequential Decision Analytics

Warren Powell
Princeton University

Wednesday, Jan 29, 2020
4:30 - 5:30 PM
Location: Shriram 262



Abstract:

Reinforcement Learning has attracted considerable attention with its successes in mastering advanced games such as chess and Go. This attention has ignored major successes such as landing SpaceX rockets using tools of optimal control, or optimizing large fleets of trucks and trains using tools from operations research and approximate dynamic programming. In fact, there are 15 different communities all contributing to the vast range of sequential problems that arise in energy, finance, transportation, health, engineering and the sciences. As each community has evolved to address a broader range of problems, there has been a consistent pattern of discovery of tools that sometimes differ in name only, or modest implementation details.

I will represent all of these communities using a single, canonical framework that mirrors the widely used modeling style from deterministic math programming. The key difference when introducing uncertainty is the need to optimize over policies. I will show that all the solution strategies (that is, policies) suggested by the research literature, in addition to some that are widely used in practice, can be organized into four fundamental classes. One of these classes, which we call ``parametric cost function approximations," is widely used in practice, but has been largeley overlooked by the academic community, with notable exception of bandit community. These ideas will be illustrated using applications drawn from transportation, energy, emergency response and material science.



Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html