Approximation Algorithms for Budgeted Learning Problems

Kamesh Munagala
Department of Computer Science
Duke University


Wednesday, May 9, 2007
4:30 - 5:30 PM
Terman Engineering Center, Room 453


Abstract:

We present the first approximation algorithms for a large class of budgeted learning problems. One classic example is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge about the underlying distribution can be refined in an "exploration" phase by playing the arms and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase, the arm with the highest (posterior) expected reward is chosen for "exploitation". The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. This is one of the simplest non-trivial examples of "active learning" or "design of experiments".

While this problem is reasonably well understood in the infinite time horizon setting, the budgeted version is NP-Hard. For this problem and several of its generalizations, we provide approximate policies that achieve reward within constant factor of the optimal reward. Our algorithms use a novel linear program rounding technique based on stochastic packing. In addition to providing theoretical guarantees, the policies generated are very intuitive, and simple to implement.

Joint work with Sudipto Guha (University of Pennsylvania).




Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html