Queueing with Redundant Requests: Exact Analysis


Mor Harchol-Balter
Carnegie Mellon University

Wednesday, April 6, 2016
4:15 - 5:15 PM
Location: Spilker 143


Abstract:

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to replicate a request so that it joins the queue at multiple servers, where the request is considered complete as soon as any one copy of the request completes. The performance analysis of systems with redundant requests is very hard, and only approximations exist. We present the first exact analysis of systems with redundancy under exponential service times. We allow for any number of classes of redundant requests, any degree of redundancy, and any number of heterogeneous servers. We derive the limiting distribution on the state of the system. For small redundancy systems, this leads to very simple distributions for response time. We then turn to larger systems with k servers, where we analyze a d-way redundancy scheme that makes d replicas of each request. We derive the exact mean response time as a function of k and d. We then derive the response time distribution in the asymptotic limit as k goes to infinity. Joint work with: Kristen Gardner, Sam Zbarsky, and Alan Scheller-Wolf.




Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html