test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works
       

Advanced Search
Home   
 
Browse   
Communities
& Collections
  
Issue Date   
Author   
Title   
Subject   
 
Sign on to:   
Receive email
updates
  
My Account
authorized users
  
Edit Profile   
 
Help   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/24381

Title: Acceleration of Iterative Methods for Markov Decision Processes
Authors: Shlakhter, Oleksandr
Advisor: Lee, Chi-Guhn
Department: Mechanical and Industrial Engineering
Keywords: Markov Decision Processes
Issue Date: 21-Apr-2010
Abstract: This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the future. MDP is one of the most powerful tools for solving such problems. There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive computation required to find close-to-optimal solutions. My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and average cost MDPs. We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem. The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence.
URI: http://hdl.handle.net/1807/24381
Appears in Collections:Doctoral
Department of Mechanical & Industrial Engineering - Doctoral theses

Files in This Item:

File Description SizeFormat
Shlakhter_Oleksandr_201003_PhD_thesis.pdfPhD thesis538.19 kBAdobe PDF
View/Open

This item is licensed under a Creative Commons License
Creative Commons

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

uoft