서브메뉴
검색
On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes.
On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes.
- 자료유형
- 학위논문
- Control Number
- 0017162876
- International Standard Book Number
- 9798382741130
- Dewey Decimal Classification Number
- 519
- Main Entry-Personal Name
- Adler, Saghar.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of Michigan., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 158 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
- General Note
- Advisor: Subramanian, Vijay Gautam.
- Dissertation Note
- Thesis (Ph.D.)--University of Michigan, 2024.
- Summary, Etc.
- 요약Recently, reinforcement learning methodologies have been applied to solve sequential decision-making problems in various fields, such as robotics and autonomous control, communication and networking, and resource allocation and scheduling. Despite great practical success, there has been less progress in developing theoretical performance guarantees for such complex systems. This dissertation aims to address the limitations of current theoretical frameworks and extend the applicability of learning-based control methods to more complex, real-life domains discussed above. This objective is achieved in two different settings using the inherent structural properties of the Markov decision processes used to model such systems. For admission control in systems modeled by the Erlang-B blocking model with unknown arrival and service rates, in the first setting, we use model knowledge to compensate for the lack of reward signals. Here, we propose a learning algorithm based on the self-tuning adaptive control and not only prove that our algorithm is asymptotically optimal but also provide finite-time regret guarantees. The second setting develops a framework to address the challenge of applying reinforcement learning methods to Markov decision processes with countably infinite state spaces and unbounded cost functions. An existing learning algorithm based on Thompson sampling with dynamically-sized episodes is extended to countably infinite state space using the ergodicity properties of Markov decision processes. We establish asymptotic optimality of our learning-based control policy by providing a sub-linear (in time-horizon) regret guarantee. Our framework is focused on models that arise in queueing system models of communication networks, computing systems, and processing networks. Hence, to demonstrate the applicability of our method, we also apply it to the problem of controlling two queueing systems with unknown dynamics.
- Subject Added Entry-Topical Term
- Applied mathematics.
- Subject Added Entry-Topical Term
- Engineering.
- Index Term-Uncontrolled
- Reinforcement learning
- Index Term-Uncontrolled
- Learning in queueing systems
- Index Term-Uncontrolled
- Markov decision processes
- Index Term-Uncontrolled
- Asymptotic optimality
- Added Entry-Corporate Name
- University of Michigan Electrical and Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 85-12B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:657760
Подробнее информация.
- Бронирование
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- моя папка