본문

서브메뉴

On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes.
Содержание
On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes.
자료유형  
 학위논문
Control Number  
0017162876
International Standard Book Number  
9798382741130
Dewey Decimal Classification Number  
519
Main Entry-Personal Name  
Adler, Saghar.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of Michigan., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
158 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
General Note  
Advisor: Subramanian, Vijay Gautam.
Dissertation Note  
Thesis (Ph.D.)--University of Michigan, 2024.
Summary, Etc.  
요약Recently, reinforcement learning methodologies have been applied to solve sequential decision-making problems in various fields, such as robotics and autonomous control, communication and networking, and resource allocation and scheduling. Despite great practical success, there has been less progress in developing theoretical performance guarantees for such complex systems. This dissertation aims to address the limitations of current theoretical frameworks and extend the applicability of learning-based control methods to more complex, real-life domains discussed above. This objective is achieved in two different settings using the inherent structural properties of the Markov decision processes used to model such systems. For admission control in systems modeled by the Erlang-B blocking model with unknown arrival and service rates, in the first setting, we use model knowledge to compensate for the lack of reward signals. Here, we propose a learning algorithm based on the self-tuning adaptive control and not only prove that our algorithm is asymptotically optimal but also provide finite-time regret guarantees. The second setting develops a framework to address the challenge of applying reinforcement learning methods to Markov decision processes with countably infinite state spaces and unbounded cost functions. An existing learning algorithm based on Thompson sampling with dynamically-sized episodes is extended to countably infinite state space using the ergodicity properties of Markov decision processes. We establish asymptotic optimality of our learning-based control policy by providing a sub-linear (in time-horizon) regret guarantee. Our framework is focused on models that arise in queueing system models of communication networks, computing systems, and processing networks. Hence, to demonstrate the applicability of our method, we also apply it to the problem of controlling two queueing systems with unknown dynamics.
Subject Added Entry-Topical Term  
Applied mathematics.
Subject Added Entry-Topical Term  
Engineering.
Index Term-Uncontrolled  
Reinforcement learning
Index Term-Uncontrolled  
Learning in queueing systems
Index Term-Uncontrolled  
Markov decision processes
Index Term-Uncontrolled  
Asymptotic optimality
Added Entry-Corporate Name  
University of Michigan Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 85-12B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:657760
New Books MORE
최근 3년간 통계입니다.

Подробнее информация.

  • Бронирование
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • моя папка
материал
Reg No. Количество платежных Местоположение статус Ленд информации
TQ0033978 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* Бронирование доступны в заимствований книги. Чтобы сделать предварительный заказ, пожалуйста, нажмите кнопку бронирование

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치