중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Topics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games- [electronic resource]

자료유형: 학위논문

Control Number: 0016932470

International Standard Book Number: 9798379717681

Dewey Decimal Classification Number: 621.3

Main Entry-Personal Name: Ni, Chengzhuo.

Publication, Distribution, etc. (Imprint: [S.l.] : Princeton University., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(269 p.)

General Note: Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

General Note: Advisor: Wang, Mengdi.

Dissertation Note: Thesis (Ph.D.)--Princeton University, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Summary, Etc.: 요약In this thesis, we study the topics on Markov Decision Processes (MDP) with a low-rank structure. We begin with the definition of a low-rank Markov Decision Process, and discuss the related applications in the followed chapters.In Chapter 2, we consider the off-policy estimation problem of the policy gradient. We propose an estimator based on Fitted Q Iteration which can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. We provide a tight finite-sample upper bound on the estimation error, given the MDP satisfies the low-rank assumption. Empirically, we evaluate the performance of the estimator on both policy gradient estimation and policy optimization. Under various metrics, our results show that the estimator significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.In Chapter 3 and Chapter 4, we study the estimation problem of low-rank MDP models. A tensor-based formulation is proposed to capture the low-rank information of the model. We develop a tensor-rank-constrained estimator that recovers the model from the collected data, and provide statistical guarantees on the estimation error. The tensor decomposition of the transition model provides useful information for the reduction of the state and action spaces. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.In Chapter 5, we study the representation learning problem of Markov Games, which is a natural extension of the MDPs to the multi-player setting. We present a model-based and a model-free approach to construct an effective representation from the collected data, which is further used to learn an equilibrium policy. A theoretical guarantee is provided, which shows the algorithm is able to find a near-optimal policy with polynomial interactions with the environment. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates function approximation.

Subject Added Entry-Topical Term: Electrical engineering.

Subject Added Entry-Topical Term: Computer engineering.

Index Term-Uncontrolled: Markov Decision Processes

Index Term-Uncontrolled: Markov games

Index Term-Uncontrolled: Tensor-based formulation

Added Entry-Corporate Name: Princeton University Electrical and Computer Engineering

Host Item Entry: Dissertations Abstracts International. 84-12B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:640400

008240220s2023        ulk                      00        kor
■001000016932470
■00520240214100502
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798379717681
■035    ▼a(MiAaPQ)AAI30493090
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a621.3
■1001  ▼aNi,  Chengzhuo.
■24510▼aTopics  in  Low-Rank  Markov  Decision  Process:  Applications  in  Policy  Gradient,  Model  Estimation  and  Markov  Games▼h[electronic  resource]
■260    ▼a[S.l.]▼bPrinceton  University.  ▼c2023
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2023
■300    ▼a1  online  resource(269  p.)
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  84-12,  Section:  B.
■500    ▼aAdvisor:  Wang,  Mengdi.
■5021  ▼aThesis  (Ph.D.)--Princeton  University,  2023.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aIn  this  thesis,  we  study  the  topics  on  Markov  Decision  Processes  (MDP)  with  a  low-rank  structure.  We  begin  with  the  definition  of  a  low-rank  Markov  Decision  Process,  and  discuss  the  related  applications  in  the  followed  chapters.In  Chapter  2,  we  consider  the  off-policy  estimation  problem  of  the  policy  gradient.  We  propose  an  estimator  based  on  Fitted  Q  Iteration  which  can  work  with  an  arbitrary  policy  parameterization,  assuming  access  to  a  Bellman-complete  value  function  class.  We  provide  a  tight  finite-sample  upper  bound  on  the  estimation  error,  given  the  MDP  satisfies  the  low-rank  assumption.  Empirically,  we  evaluate  the  performance  of  the  estimator  on  both  policy  gradient  estimation  and  policy  optimization.  Under  various  metrics,  our  results  show  that  the  estimator  significantly  outperforms  existing  off-policy  PG  estimation  methods  based  on  importance  sampling  and  variance  reduction  techniques.In  Chapter  3  and  Chapter  4,  we  study  the  estimation  problem  of  low-rank  MDP  models.  A  tensor-based  formulation  is  proposed  to  capture  the  low-rank  information  of  the  model.  We  develop  a  tensor-rank-constrained  estimator  that  recovers  the  model  from  the  collected  data,  and  provide  statistical  guarantees  on  the  estimation  error.  The  tensor  decomposition  of  the  transition  model  provides  useful  information  for  the  reduction  of  the  state  and  action  spaces.  We  further  prove  that  the  learned  state/action  abstractions  provide  accurate  approximations  to  latent  block  structures  if  they  exist,  enabling  function  approximation  in  downstream  tasks  such  as  policy  evaluation.In  Chapter  5,  we  study  the  representation  learning  problem  of  Markov  Games,  which  is  a  natural  extension  of  the  MDPs  to  the  multi-player  setting.  We  present  a  model-based  and  a  model-free  approach  to  construct  an  effective  representation  from  the  collected  data,  which  is  further  used  to  learn  an  equilibrium  policy.  A  theoretical  guarantee  is  provided,  which  shows  the  algorithm  is  able  to  find  a  near-optimal  policy  with  polynomial  interactions  with  the  environment.  To  our  best  knowledge,  this  is  the  first  sample-efficient  algorithm  for  multi-agent  general-sum  Markov  games  that  incorporates  function  approximation.
■590    ▼aSchool  code:  0181.
■650  4▼aElectrical  engineering.
■650  4▼aComputer  engineering.
■653    ▼aMarkov  Decision  Processes
■653    ▼aMarkov  games
■653    ▼aTensor-based  formulation
■690    ▼a0544
■690    ▼a0464
■71020▼aPrinceton  University▼bElectrical  and  Computer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g84-12B.
■773    ▼tDissertation  Abstract  International
■790    ▼a0181
■791    ▼aPh.D.
■792    ▼a2023
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16932470▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202402▼f2024

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Réservation
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
My Folder

Matériel
Reg No.	Call No.	emplacement	Status	Lend Info
TQ0026320	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Les réservations sont disponibles dans le livre d'emprunt. Pour faire des réservations, S'il vous plaît cliquer sur le bouton de réservation

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Info Détail de la recherche.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK