Linearly Parameterized Bandits

Paat Rusmevichientong
Operations Research and Information Engineering
Cornell University


Wednesday, February 11, 2009
4:30 - 5:30 PM
Terman Engineering Center, Room 453


Abstract:

Motivated by applications in e-commerce and revenue management, we consider multiarmed bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an unknown univariate or multivariate random variable. Our proposed model also has potential applications in product design and health care. The objective is to choose a sequence of arms to minimize the cumulative regret and Bayes risk. For the univariate case, we find that a greedy policy suffices. For the multivariate case, when the set of arms is strongly convex, we provide a lower bound and a policy that results in a matching upper bound. In the absence of strong convexity, we describe a policy whose performance is within a polylogarithmic factor from the optimal.

This is a joint work with John Tsitsiklis and Adam Mersereau.






Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html