Irrevocable Multi-Armed Bandit Problem

Ritesh Madan
Qualcomm-Flarion Technologies


Wednesday, May 27, 2009
4:30 - 5:30 PM
Terman Engineering Center, Room 453


Abstract:

We consider the multi-armed bandit problem with multiple simultaneous pulls with the following restriction: An arm that is pulled in the past and discarded may not be pulled again. This "irrevocable" property is highly desirable from an operational perspective in applications such as fast fashion retailing, drug trials, and call center staffing. By constructing a packing heuristic, we show that the price of irrevocability is uniformly bounded for a large class of multi-armed bandit problems. This class includes a large class of learning problems widely used for modeling the above applications. Computational results demonstrate that the packing heuristic offers excellent performance. We also compare the performance of the packing heuristic to Whittle's heuristic, and a modification of Whittle's heuristic to make it irrevocable. Finally, using dual decomposition, we derive a fast combinatorial algorithm for computing the packing heuristic. The algorithm has a complexity of O(nTAS^2 log(kT)) for n bandits, k simultaneous pulls, A actions in each state, S states, time horizon T.

This is joint work with Vivek Farias (MIT).





Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html