Lower bounds for Howard's algorithm for finding Minimum Mean-Cost Cycles
Thomas Dueholm Hansen
Department of Computer Science, Aarhus University
Thursday, June 9, 2011
4:30 - 5:30 PM
Y2E2 105
Abstract:
Howard's policy iteration algorithm is one of the most widely used
algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs,
which may be viewed as Deterministic MDPs (DMDPs), Howard's algorithm
can be used to find Minimum Mean-Cost cycles (MMCC). Experimental
studies suggest that Howard's algorithm works very well in this
context. The theoretical complexity of Howard's algorithm for finding
MMCCs is a mystery. No polynomial time bound is known on its running
time. Prior to this work, there were only linear lower bounds on the
number of iterations performed by Howard's algorithm. We provide the
first weighted graphs on which Howard's algorithm performs a number of
iterations that is quadratic in the number of vertices in the graph.
We also show how this result can be extended to discounted DMDPs.
Joint work with Uri Zwick
Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html