Lower bounds for Howard's algorithm for finding Minimum Mean-Cost Cycles

Thomas Dueholm Hansen
Department of Computer Science, Aarhus University


Thursday, June 9, 2011
4:30 - 5:30 PM
Y2E2 105

Abstract:

Howard's policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard's algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard's algorithm works very well in this context. The theoretical complexity of Howard's algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard's algorithm. We provide the first weighted graphs on which Howard's algorithm performs a number of iterations that is quadratic in the number of vertices in the graph. We also show how this result can be extended to discounted DMDPs.

Joint work with Uri Zwick






Operations Research Colloquia: http://or.stanford.edu/oras_seminars.html