Approximation Algorithms for Budgeted Learning Problems
Kamesh Munagala
Department of Computer Science
Duke University
Wednesday, May 9, 2007
4:30 - 5:30 PM
Terman Engineering Center, Room 453
Abstract:
We present the first approximation algorithms for a large class of
budgeted learning problems. One classic example is the budgeted
multi-armed bandit problem. In this problem each arm of the bandit
has an unknown reward distribution on which a prior is specified as
input. The knowledge about the underlying distribution can be refined
in an "exploration" phase by playing the arms and observing the
rewards. However, there is a budget on the total number of plays
allowed during exploration. After this exploration phase, the arm with
the highest (posterior) expected reward is chosen for "exploitation".
The goal is to design the adaptive exploration phase subject to a
budget constraint on the number of plays, in order to maximize the
expected reward of the arm chosen for exploitation. This is one of the
simplest non-trivial examples of "active learning" or "design of
experiments".
While this problem is reasonably well understood in the infinite time
horizon setting, the budgeted version is NP-Hard. For this problem
and several of its generalizations, we provide approximate policies
that achieve reward within constant factor of the optimal reward.
Our algorithms use a novel linear program rounding technique based on
stochastic packing. In addition to providing theoretical guarantees,
the policies generated are very intuitive, and simple to implement.
Joint work with Sudipto Guha (University of Pennsylvania).
Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html