Multi-Agent Online Learning under Imperfect Information, Algorithms, Theory and Applications


Zhengyuan Zhou
Stanford University

Friday, October 6, 2017
1:15 - 2:15 PM
Location: Packard 202


Abstract:

We consider a model of multi-agent online learning under imperfect information, where the reward structures of agents are given by a general continuous game. After introducing a general equilibrium stability notion for continuous games, called variational stability, we examine the well-known online mirror descent (OMD) learning algorithm and show that the ¿last iterate¿ (that is, the actual sequence of actions) of OMD converges to variationally stable Nash equilibria provided that the feedback delays faced by the agents are synchronous and bounded. We then extend the result to almost sure convergence to variationally stable Nash equilibria under both unbiased noise and synchronous and bounded delays. Subsequently, to tackle fully decentralized, asynchronous environments with unbounded feedback delays, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to variationally stable Nash equilibria, with no feedback synchronicity assumptions, and even when the delays grow super-linearly relative to the game¿s horizon. We then again extend it to the case where there are both delays and noise.

In the second part of the talk, we present two applications of the multi-agent online learning framework. The first application concerns non-convex stochastic optimization, where we characterize almost sure convergence of the well-known stochastic mirror descent algorithm to global optima for a large class of non-convex stochastic optimization problems (strictly including convex, quasi-convex and start-convex problems). A step further, our results also include as a special case the large-scale stochastic optimization problem, where stochastic mirror descent is applied in a distributed, asynchronous manner across multiple machines/processors. Time permitting, we will discuss how these results help (at least in part) clarify and affirm the recent successes of mirror-descent type algorithms in large-scale machine learning. The second application concerns power management on random wireless networks, where we use a game-design approach to derive robust power control algorithms that converge (almost surely) to the optimal power allocation in the presence of randomly fluctuating networks.

This is joint work with Nick Bambos, Stephen Boyd, Peter Glynn, Panayotis Mertikopoulos and Claire Tomlin.



Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html