본문

서브메뉴

Topics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games- [electronic resource]
Topics in Low-Rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games- [electronic resource]

상세정보

자료유형  
 학위논문
Control Number  
0016932470
International Standard Book Number  
9798379717681
Dewey Decimal Classification Number  
621.3
Main Entry-Personal Name  
Ni, Chengzhuo.
Publication, Distribution, etc. (Imprint  
[S.l.] : Princeton University., 2023
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2023
Physical Description  
1 online resource(269 p.)
General Note  
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
General Note  
Advisor: Wang, Mengdi.
Dissertation Note  
Thesis (Ph.D.)--Princeton University, 2023.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약In this thesis, we study the topics on Markov Decision Processes (MDP) with a low-rank structure. We begin with the definition of a low-rank Markov Decision Process, and discuss the related applications in the followed chapters.In Chapter 2, we consider the off-policy estimation problem of the policy gradient. We propose an estimator based on Fitted Q Iteration which can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. We provide a tight finite-sample upper bound on the estimation error, given the MDP satisfies the low-rank assumption. Empirically, we evaluate the performance of the estimator on both policy gradient estimation and policy optimization. Under various metrics, our results show that the estimator significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.In Chapter 3 and Chapter 4, we study the estimation problem of low-rank MDP models. A tensor-based formulation is proposed to capture the low-rank information of the model. We develop a tensor-rank-constrained estimator that recovers the model from the collected data, and provide statistical guarantees on the estimation error. The tensor decomposition of the transition model provides useful information for the reduction of the state and action spaces. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.In Chapter 5, we study the representation learning problem of Markov Games, which is a natural extension of the MDPs to the multi-player setting. We present a model-based and a model-free approach to construct an effective representation from the collected data, which is further used to learn an equilibrium policy. A theoretical guarantee is provided, which shows the algorithm is able to find a near-optimal policy with polynomial interactions with the environment. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates function approximation.
Subject Added Entry-Topical Term  
Electrical engineering.
Subject Added Entry-Topical Term  
Computer engineering.
Index Term-Uncontrolled  
Markov Decision Processes
Index Term-Uncontrolled  
Markov games
Index Term-Uncontrolled  
Tensor-based formulation
Added Entry-Corporate Name  
Princeton University Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 84-12B.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:640400

MARC

 008240220s2023        ulk                      00        kor
■001000016932470
■00520240214100502
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798379717681
■035    ▼a(MiAaPQ)AAI30493090
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a621.3
■1001  ▼aNi,  Chengzhuo.
■24510▼aTopics  in  Low-Rank  Markov  Decision  Process:  Applications  in  Policy  Gradient,  Model  Estimation  and  Markov  Games▼h[electronic  resource]
■260    ▼a[S.l.]▼bPrinceton  University.  ▼c2023
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2023
■300    ▼a1  online  resource(269  p.)
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  84-12,  Section:  B.
■500    ▼aAdvisor:  Wang,  Mengdi.
■5021  ▼aThesis  (Ph.D.)--Princeton  University,  2023.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aIn  this  thesis,  we  study  the  topics  on  Markov  Decision  Processes  (MDP)  with  a  low-rank  structure.  We  begin  with  the  definition  of  a  low-rank  Markov  Decision  Process,  and  discuss  the  related  applications  in  the  followed  chapters.In  Chapter  2,  we  consider  the  off-policy  estimation  problem  of  the  policy  gradient.  We  propose  an  estimator  based  on  Fitted  Q  Iteration  which  can  work  with  an  arbitrary  policy  parameterization,  assuming  access  to  a  Bellman-complete  value  function  class.  We  provide  a  tight  finite-sample  upper  bound  on  the  estimation  error,  given  the  MDP  satisfies  the  low-rank  assumption.  Empirically,  we  evaluate  the  performance  of  the  estimator  on  both  policy  gradient  estimation  and  policy  optimization.  Under  various  metrics,  our  results  show  that  the  estimator  significantly  outperforms  existing  off-policy  PG  estimation  methods  based  on  importance  sampling  and  variance  reduction  techniques.In  Chapter  3  and  Chapter  4,  we  study  the  estimation  problem  of  low-rank  MDP  models.  A  tensor-based  formulation  is  proposed  to  capture  the  low-rank  information  of  the  model.  We  develop  a  tensor-rank-constrained  estimator  that  recovers  the  model  from  the  collected  data,  and  provide  statistical  guarantees  on  the  estimation  error.  The  tensor  decomposition  of  the  transition  model  provides  useful  information  for  the  reduction  of  the  state  and  action  spaces.  We  further  prove  that  the  learned  state/action  abstractions  provide  accurate  approximations  to  latent  block  structures  if  they  exist,  enabling  function  approximation  in  downstream  tasks  such  as  policy  evaluation.In  Chapter  5,  we  study  the  representation  learning  problem  of  Markov  Games,  which  is  a  natural  extension  of  the  MDPs  to  the  multi-player  setting.  We  present  a  model-based  and  a  model-free  approach  to  construct  an  effective  representation  from  the  collected  data,  which  is  further  used  to  learn  an  equilibrium  policy.  A  theoretical  guarantee  is  provided,  which  shows  the  algorithm  is  able  to  find  a  near-optimal  policy  with  polynomial  interactions  with  the  environment.  To  our  best  knowledge,  this  is  the  first  sample-efficient  algorithm  for  multi-agent  general-sum  Markov  games  that  incorporates  function  approximation.
■590    ▼aSchool  code:  0181.
■650  4▼aElectrical  engineering.
■650  4▼aComputer  engineering.
■653    ▼aMarkov  Decision  Processes
■653    ▼aMarkov  games
■653    ▼aTensor-based  formulation
■690    ▼a0544
■690    ▼a0464
■71020▼aPrinceton  University▼bElectrical  and  Computer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g84-12B.
■773    ▼tDissertation  Abstract  International
■790    ▼a0181
■791    ▼aPh.D.
■792    ▼a2023
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16932470▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202402▼f2024

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    Info Détail de la recherche.

    • Réservation
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • My Folder
    Matériel
    Reg No. Call No. emplacement Status Lend Info
    TQ0026320 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * Les réservations sont disponibles dans le livre d'emprunt. Pour faire des réservations, S'il vous plaît cliquer sur le bouton de réservation

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치