중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Содержание

On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes.

자료유형: 학위논문

Control Number: 0017162876

International Standard Book Number: 9798382741130

Dewey Decimal Classification Number: 519

Main Entry-Personal Name: Adler, Saghar.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Michigan., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 158 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-12, Section: B.

General Note: Advisor: Subramanian, Vijay Gautam.

Dissertation Note: Thesis (Ph.D.)--University of Michigan, 2024.

Summary, Etc.: 요약Recently, reinforcement learning methodologies have been applied to solve sequential decision-making problems in various fields, such as robotics and autonomous control, communication and networking, and resource allocation and scheduling. Despite great practical success, there has been less progress in developing theoretical performance guarantees for such complex systems. This dissertation aims to address the limitations of current theoretical frameworks and extend the applicability of learning-based control methods to more complex, real-life domains discussed above. This objective is achieved in two different settings using the inherent structural properties of the Markov decision processes used to model such systems. For admission control in systems modeled by the Erlang-B blocking model with unknown arrival and service rates, in the first setting, we use model knowledge to compensate for the lack of reward signals. Here, we propose a learning algorithm based on the self-tuning adaptive control and not only prove that our algorithm is asymptotically optimal but also provide finite-time regret guarantees. The second setting develops a framework to address the challenge of applying reinforcement learning methods to Markov decision processes with countably infinite state spaces and unbounded cost functions. An existing learning algorithm based on Thompson sampling with dynamically-sized episodes is extended to countably infinite state space using the ergodicity properties of Markov decision processes. We establish asymptotic optimality of our learning-based control policy by providing a sub-linear (in time-horizon) regret guarantee. Our framework is focused on models that arise in queueing system models of communication networks, computing systems, and processing networks. Hence, to demonstrate the applicability of our method, we also apply it to the problem of controlling two queueing systems with unknown dynamics.

Subject Added Entry-Topical Term: Applied mathematics.

Subject Added Entry-Topical Term: Engineering.

Index Term-Uncontrolled: Reinforcement learning

Index Term-Uncontrolled: Learning in queueing systems

Index Term-Uncontrolled: Markov decision processes

Index Term-Uncontrolled: Asymptotic optimality

Added Entry-Corporate Name: University of Michigan Electrical and Computer Engineering

Host Item Entry: Dissertations Abstracts International. 85-12B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:657760

New Books MORE

최근 3년간 통계입니다.

Бронирование
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
моя папка

материал
Reg No.	Количество платежных	Местоположение	статус	Ленд информации
TQ0033978	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Бронирование доступны в заимствований книги. Чтобы сделать предварительный заказ, пожалуйста, нажмите кнопку бронирование

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

Подробнее информация.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK