서브메뉴
검색
Topics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games- [electronic resource]
Topics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games- [electronic resource]
상세정보
- 자료유형
- 학위논문
- Control Number
- 0016932470
- International Standard Book Number
- 9798379717681
- Dewey Decimal Classification Number
- 621.3
- Main Entry-Personal Name
- Ni, Chengzhuo.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Princeton University., 2023
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Physical Description
- 1 online resource(269 p.)
- General Note
- Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
- General Note
- Advisor: Wang, Mengdi.
- Dissertation Note
- Thesis (Ph.D.)--Princeton University, 2023.
- Restrictions on Access Note
- This item must not be sold to any third party vendors.
- Summary, Etc.
- 요약In this thesis, we study the topics on Markov Decision Processes (MDP) with a low-rank structure. We begin with the definition of a low-rank Markov Decision Process, and discuss the related applications in the followed chapters.In Chapter 2, we consider the off-policy estimation problem of the policy gradient. We propose an estimator based on Fitted Q Iteration which can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. We provide a tight finite-sample upper bound on the estimation error, given the MDP satisfies the low-rank assumption. Empirically, we evaluate the performance of the estimator on both policy gradient estimation and policy optimization. Under various metrics, our results show that the estimator significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.In Chapter 3 and Chapter 4, we study the estimation problem of low-rank MDP models. A tensor-based formulation is proposed to capture the low-rank information of the model. We develop a tensor-rank-constrained estimator that recovers the model from the collected data, and provide statistical guarantees on the estimation error. The tensor decomposition of the transition model provides useful information for the reduction of the state and action spaces. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.In Chapter 5, we study the representation learning problem of Markov Games, which is a natural extension of the MDPs to the multi-player setting. We present a model-based and a model-free approach to construct an effective representation from the collected data, which is further used to learn an equilibrium policy. A theoretical guarantee is provided, which shows the algorithm is able to find a near-optimal policy with polynomial interactions with the environment. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates function approximation.
- Subject Added Entry-Topical Term
- Electrical engineering.
- Subject Added Entry-Topical Term
- Computer engineering.
- Index Term-Uncontrolled
- Markov Decision Processes
- Index Term-Uncontrolled
- Markov games
- Index Term-Uncontrolled
- Tensor-based formulation
- Added Entry-Corporate Name
- Princeton University Electrical and Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 84-12B.
- Host Item Entry
- Dissertation Abstract International
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:640400
MARC
008240220s2023 ulk 00 kor■001000016932470
■00520240214100502
■006m o d
■007cr#unu||||||||
■020 ▼a9798379717681
■035 ▼a(MiAaPQ)AAI30493090
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a621.3
■1001 ▼aNi, Chengzhuo.
■24510▼aTopics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games▼h[electronic resource]
■260 ▼a[S.l.]▼bPrinceton University. ▼c2023
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2023
■300 ▼a1 online resource(269 p.)
■500 ▼aSource: Dissertations Abstracts International, Volume: 84-12, Section: B.
■500 ▼aAdvisor: Wang, Mengdi.
■5021 ▼aThesis (Ph.D.)--Princeton University, 2023.
■506 ▼aThis item must not be sold to any third party vendors.
■520 ▼aIn this thesis, we study the topics on Markov Decision Processes (MDP) with a low-rank structure. We begin with the definition of a low-rank Markov Decision Process, and discuss the related applications in the followed chapters.In Chapter 2, we consider the off-policy estimation problem of the policy gradient. We propose an estimator based on Fitted Q Iteration which can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. We provide a tight finite-sample upper bound on the estimation error, given the MDP satisfies the low-rank assumption. Empirically, we evaluate the performance of the estimator on both policy gradient estimation and policy optimization. Under various metrics, our results show that the estimator significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.In Chapter 3 and Chapter 4, we study the estimation problem of low-rank MDP models. A tensor-based formulation is proposed to capture the low-rank information of the model. We develop a tensor-rank-constrained estimator that recovers the model from the collected data, and provide statistical guarantees on the estimation error. The tensor decomposition of the transition model provides useful information for the reduction of the state and action spaces. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.In Chapter 5, we study the representation learning problem of Markov Games, which is a natural extension of the MDPs to the multi-player setting. We present a model-based and a model-free approach to construct an effective representation from the collected data, which is further used to learn an equilibrium policy. A theoretical guarantee is provided, which shows the algorithm is able to find a near-optimal policy with polynomial interactions with the environment. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates function approximation.
■590 ▼aSchool code: 0181.
■650 4▼aElectrical engineering.
■650 4▼aComputer engineering.
■653 ▼aMarkov Decision Processes
■653 ▼aMarkov games
■653 ▼aTensor-based formulation
■690 ▼a0544
■690 ▼a0464
■71020▼aPrinceton University▼bElectrical and Computer Engineering.
■7730 ▼tDissertations Abstracts International▼g84-12B.
■773 ▼tDissertation Abstract International
■790 ▼a0181
■791 ▼aPh.D.
■792 ▼a2023
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16932470▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.
■980 ▼a202402▼f2024
미리보기
내보내기
chatGPT토론
Ai 추천 관련 도서
Info Détail de la recherche.
- Réservation
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- My Folder