중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets- [electronic resource]

자료유형: 학위논문

Control Number: 0016933349

International Standard Book Number: 9798380329958

Dewey Decimal Classification Number: 658

Main Entry-Personal Name: Ramani, Sivaramakrishnan.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Washington., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(131 p.)

General Note: Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

General Note: Advisor: Ghate, Archis.

Dissertation Note: Thesis (Ph.D.)--University of Washington, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Summary, Etc.: 요약We consider finite- and infinite-horizon Markov decision processes (MDPs), with unknown state-transition probabilities. These transition probabilities are assumed to belong to certain ambiguity sets, and the goal is to maximize the worst-case expected total discounted reward over all probabilities from these sets. Specifically, the ambiguity set is a ball - it includes all probability distributions within a certain distance from the empirical distribution constructed using historical, independent observations of state transitions. We therefore call these problems robust MDPs (RMDPs) with data-driven, distance-based ambiguity sets.The literature on data-driven robust optimization mentions (i) robust value convergence with respect to sample size, (ii) out-of-sample value convergence with respect to sample size, (iii) probabilistic performance guarantees on out-of-sample values, and (iv) a probabilistic convergence rate, as desirable properties. The research objective of this dissertation is to establish essentially a minimal set of conditions under which RMDPs with data-driven, distance-based ambiguity sets exhibit these four properties.We first achieve this for the (s, a)-rectangular RMDPs ((s, a)-RMDPs) studied in Chapter 2. There, the ambiguity set for the whole MDP equals a Cartesian product of ambiguity sets for individual state-action pairs. We establish robust and out-of-sample value convergence under a generalized Pinsker's inequality, if the radii of the ambiguity balls approach zero as the sample-size diverges to infinity. This inequality links convergence of probability distributions with respect to the distance function, to their topological convergence in a Euclidean space. We also establish that, for finite sample-sizes, the optimal value of the RMDP provides a lower bound on the value of the robust optimal policy in the true MDP, with a high probability. This probabilistic performance guarantee relies on a certain concentration inequality. Under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. These two inequalities hold for several well-known distance functions including total variation, Burg, Hellinger, and Wasserstein. We present computational results on a generic MDP and a machine repair example using total variation, Burg, and Wasserstein distances. These results illustrate that the generality of our framework provides broad choices when attempting to trade-off conservativeness of robust optimal policies against their out-of-sample performance by tuning ambiguity ball radii.In Chapter 3, we extend results from Chapter 2 to a so-called s-rectangular framework. In this more general context, the ambiguity set for the MDP is a Cartesian product of ambiguity sets for individual states. In that chapter, we introduce a family of distance-based s-rectangular RMDPs (s-RMDPs) indexed with ρ ∈ [1,∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across different actions equals a sublevel set of the ℓρ-norm of a vector of distances from reference pmfs. Setting ρ = ∞ in this family recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)-RMDP whose robust optimal value is at least as good; and vice versa. This occurs because the s- and (s, a)-RMDPs can employ different ambiguity set radii, casting a nuanced doubt on the anecdotal belief that (s, a)-RMDPs are more conservative than s-RMDPs. More strongly, if the distance function is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. This suggests that appropriate caution should be exercised before interpreting too broadly any anecdotal claims that (s, a)-RMDPs are more conservative than s-rectangular ones. We also study data-driven versions of our s-RMDPs, where the reference pmf equals the empirical pmf constructed from state transition samples. We prove that the robust optimal values, and the out-of-sample values of robust optimal policies both converge to the true optimal, asymptotically with sample sizes, if the distance function satisfies the generalized Pinsker's inequality introduced in Chapter 2. The robust optimal value also provides a probabilistic lower bound on the out-of-sample value of a robust optimal policy, when the distance function satisfies the concentration inequality. This finite-sample guarantee admits a surprising conclusion - (s, a)-RMDPs are the least conservative among all s-RMDPs within our family. Like in Chapter 2, under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. Though similar asymptotic and finite-sample results were developed for (s, a)-RMDPs in Chapter 2, the proof techniques in this chapter are different and more sophisticated. These more involved proofs are needed because the structure of s-RMDPs introduces new analytical hurdles in our attempt to establish the sufficiency of generalized Pinsker and concentration inequalities. For example, it is no longer adequate to limit attention to deterministic policies - randomization may be needed for optimality. We also present computational experiments on a machine repair example using the total variation distance and ρ = 1. The results of those experiments validate the claims established in that chapter.Finally, in Chapter 4, we develop a data-driven, distance-based RMDP framework on separable complete metric spaces. We extend our asymptotic and finite-sample results to that setup. Unlike our first two studies, this more general endeavor relies on measure-theoretic concepts from minimax control.

Subject Added Entry-Topical Term: Industrial engineering.

Index Term-Uncontrolled: Robust optimization

Index Term-Uncontrolled: Dynamic programming

Index Term-Uncontrolled: Probabilistic performance guarantee

Index Term-Uncontrolled: Value convergence

Index Term-Uncontrolled: Markov decision processes

Added Entry-Corporate Name: University of Washington Industrial and Systems Engineering

Host Item Entry: Dissertations Abstracts International. 85-03B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:640073

008240219s2023        ulk                      00        kor
■001000016933349
■00520240214101235
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798380329958
■035    ▼a(MiAaPQ)AAI30527919
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a658
■1001  ▼aRamani,  Sivaramakrishnan.
■24510▼aRobust  Markov  Decision  Processes  With  Data-Driven,  Distance-Based  Ambiguity  Sets▼h[electronic  resource]
■260    ▼a[S.l.]▼bUniversity  of  Washington.  ▼c2023
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2023
■300    ▼a1  online  resource(131  p.)
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-03,  Section:  B.
■500    ▼aAdvisor:  Ghate,  Archis.
■5021  ▼aThesis  (Ph.D.)--University  of  Washington,  2023.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aWe  consider  finite-  and  infinite-horizon  Markov  decision  processes  (MDPs),  with  unknown  state-transition  probabilities.  These  transition  probabilities  are  assumed  to  belong  to  certain  ambiguity  sets,  and  the  goal  is  to  maximize  the  worst-case  expected  total  discounted  reward  over  all  probabilities  from  these  sets.  Specifically,  the  ambiguity  set  is  a  ball  -  it  includes  all  probability  distributions  within  a  certain  distance  from  the  empirical  distribution  constructed  using  historical,  independent  observations  of  state  transitions.  We  therefore  call  these  problems  robust  MDPs  (RMDPs)  with  data-driven,  distance-based  ambiguity  sets.The  literature  on  data-driven  robust  optimization  mentions  (i)  robust  value  convergence  with  respect  to  sample  size,  (ii)  out-of-sample  value  convergence  with  respect  to  sample  size,  (iii)  probabilistic  performance  guarantees  on  out-of-sample  values,  and  (iv)  a  probabilistic  convergence  rate,  as  desirable  properties.  The  research  objective  of  this  dissertation  is  to  establish  essentially  a  minimal  set  of  conditions  under  which  RMDPs  with  data-driven,  distance-based  ambiguity  sets  exhibit  these  four  properties.We  first  achieve  this  for  the  (s,  a)-rectangular  RMDPs  ((s,  a)-RMDPs)  studied  in  Chapter  2.  There,  the  ambiguity  set  for  the  whole  MDP  equals  a  Cartesian  product  of  ambiguity  sets  for  individual  state-action  pairs.  We  establish  robust  and  out-of-sample  value  convergence  under  a  generalized  Pinsker's  inequality,  if  the  radii  of  the  ambiguity  balls  approach  zero  as  the  sample-size  diverges  to  infinity.  This  inequality  links  convergence  of  probability  distributions  with  respect  to  the  distance  function,  to  their  topological  convergence  in  a  Euclidean  space.  We  also  establish  that,  for  finite  sample-sizes,  the  optimal  value  of  the  RMDP  provides  a  lower  bound  on  the  value  of  the  robust  optimal  policy  in  the  true  MDP,  with  a  high  probability.  This  probabilistic  performance  guarantee  relies  on  a  certain  concentration  inequality.  Under  these  generalized  Pinsker  and  concentration  inequalities,  we  also  derive  a  probabilistic  rate  of  convergence  for  the  robust  and  out-of-sample  values  to  the  true  optimal  value.  These  two  inequalities  hold  for  several  well-known  distance  functions  including  total  variation,  Burg,  Hellinger,  and  Wasserstein.  We  present  computational  results  on  a  generic  MDP  and  a  machine  repair  example  using  total  variation,  Burg,  and  Wasserstein  distances.  These  results  illustrate  that  the  generality  of  our  framework  provides  broad  choices  when  attempting  to  trade-off  conservativeness  of  robust  optimal  policies  against  their  out-of-sample  performance  by  tuning  ambiguity  ball  radii.In  Chapter  3,  we  extend  results  from  Chapter  2  to  a  so-called  s-rectangular  framework.  In  this  more  general  context,  the  ambiguity  set  for  the  MDP  is  a  Cartesian  product  of  ambiguity  sets  for  individual  states.  In  that  chapter,  we  introduce  a  family  of  distance-based  s-rectangular  RMDPs  (s-RMDPs)  indexed  with  ρ  ∈  [1,∞].  In  each  state,  the  ambiguity  set  of  transition  probability  mass  functions  (pmfs)  across  different  actions  equals  a  sublevel  set  of  the  ℓρ-norm  of  a  vector  of  distances  from  reference  pmfs.  Setting  ρ  =  ∞  in  this  family  recovers  (s,  a)-RMDPs.  For  any  s-RMDP  from  this  family,  there  is  an  (s,  a)-RMDP  whose  robust  optimal  value  is  at  least  as  good;  and  vice  versa.  This  occurs  because  the  s-  and  (s,  a)-RMDPs  can  employ  different  ambiguity  set  radii,  casting  a  nuanced  doubt  on  the  anecdotal  belief  that  (s,  a)-RMDPs  are  more  conservative  than  s-RMDPs.  More  strongly,  if  the  distance  function  is  lower  semicontinuous  and  convex,  then,  for  any  s-RMDP,  there  exists  an  (s,  a)-RMDP  with  an  identical  robust  optimal  value.  This  suggests  that  appropriate  caution  should  be  exercised  before  interpreting  too  broadly  any  anecdotal  claims  that  (s,  a)-RMDPs  are  more  conservative  than  s-rectangular  ones.  We  also  study  data-driven  versions  of  our  s-RMDPs,  where  the  reference  pmf  equals  the  empirical  pmf  constructed  from  state  transition  samples.  We  prove  that  the  robust  optimal  values,  and  the  out-of-sample  values  of  robust  optimal  policies  both  converge  to  the  true  optimal,  asymptotically  with  sample  sizes,  if  the  distance  function  satisfies  the  generalized  Pinsker's  inequality  introduced  in  Chapter  2.  The  robust  optimal  value  also  provides  a  probabilistic  lower  bound  on  the  out-of-sample  value  of  a  robust  optimal  policy,  when  the  distance  function  satisfies  the  concentration  inequality.  This  finite-sample  guarantee  admits  a  surprising  conclusion  -  (s,  a)-RMDPs  are  the  least  conservative  among  all  s-RMDPs  within  our  family.  Like  in  Chapter  2,  under  these  generalized  Pinsker  and  concentration  inequalities,  we  also  derive  a  probabilistic  rate  of  convergence  for  the  robust  and  out-of-sample  values  to  the  true  optimal  value.  Though  similar  asymptotic  and  finite-sample  results  were  developed  for  (s,  a)-RMDPs  in  Chapter  2,  the  proof  techniques  in  this  chapter  are  different  and  more  sophisticated.  These  more  involved  proofs  are  needed  because  the  structure  of  s-RMDPs  introduces  new  analytical  hurdles  in  our  attempt  to  establish  the  sufficiency  of  generalized  Pinsker  and  concentration  inequalities.  For  example,  it  is  no  longer  adequate  to  limit  attention  to  deterministic  policies  -  randomization  may  be  needed  for  optimality.  We  also  present  computational  experiments  on  a  machine  repair  example  using  the  total  variation  distance  and  ρ  =  1.  The  results  of  those  experiments  validate  the  claims  established  in  that  chapter.Finally,  in  Chapter  4,  we  develop  a  data-driven,  distance-based  RMDP  framework  on  separable  complete  metric  spaces.  We  extend  our  asymptotic  and  finite-sample  results  to  that  setup.  Unlike  our  first  two  studies,  this  more  general  endeavor  relies  on  measure-theoretic  concepts  from  minimax  control.
■590    ▼aSchool  code:  0250.
■650  4▼aIndustrial  engineering.
■653    ▼aRobust  optimization
■653    ▼aDynamic  programming
■653    ▼aProbabilistic  performance  guarantee
■653    ▼aValue  convergence
■653    ▼aMarkov  decision  processes
■690    ▼a0546
■690    ▼a0796
■71020▼aUniversity  of  Washington▼bIndustrial  and  Systems  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g85-03B.
■773    ▼tDissertation  Abstract  International
■790    ▼a0250
■791    ▼aPh.D.
■792    ▼a2023
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16933349▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202402▼f2024

New Books MORE

Related books MORE

최근 3년간 통계입니다.

จองห้องพัก
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
โฟลเดอร์ของฉัน

วัสดุ
Reg No.	Call No.	ตำแหน่งที่ตั้ง	สถานะ	ยืมข้อมูล
TQ0025986	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* จองมีอยู่ในหนังสือยืม เพื่อให้การสำรองที่นั่งคลิกที่ปุ่มจองห้องพัก

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

ค้นหาข้อมูลรายละเอียด

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK