서브메뉴
검색
Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets- [electronic resource]
Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets- [electronic resource]
상세정보
- 자료유형
- 학위논문
- Control Number
- 0016933349
- International Standard Book Number
- 9798380329958
- Dewey Decimal Classification Number
- 658
- Main Entry-Personal Name
- Ramani, Sivaramakrishnan.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of Washington., 2023
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Physical Description
- 1 online resource(131 p.)
- General Note
- Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
- General Note
- Advisor: Ghate, Archis.
- Dissertation Note
- Thesis (Ph.D.)--University of Washington, 2023.
- Restrictions on Access Note
- This item must not be sold to any third party vendors.
- Summary, Etc.
- 요약We consider finite- and infinite-horizon Markov decision processes (MDPs), with unknown state-transition probabilities. These transition probabilities are assumed to belong to certain ambiguity sets, and the goal is to maximize the worst-case expected total discounted reward over all probabilities from these sets. Specifically, the ambiguity set is a ball - it includes all probability distributions within a certain distance from the empirical distribution constructed using historical, independent observations of state transitions. We therefore call these problems robust MDPs (RMDPs) with data-driven, distance-based ambiguity sets.The literature on data-driven robust optimization mentions (i) robust value convergence with respect to sample size, (ii) out-of-sample value convergence with respect to sample size, (iii) probabilistic performance guarantees on out-of-sample values, and (iv) a probabilistic convergence rate, as desirable properties. The research objective of this dissertation is to establish essentially a minimal set of conditions under which RMDPs with data-driven, distance-based ambiguity sets exhibit these four properties.We first achieve this for the (s, a)-rectangular RMDPs ((s, a)-RMDPs) studied in Chapter 2. There, the ambiguity set for the whole MDP equals a Cartesian product of ambiguity sets for individual state-action pairs. We establish robust and out-of-sample value convergence under a generalized Pinsker's inequality, if the radii of the ambiguity balls approach zero as the sample-size diverges to infinity. This inequality links convergence of probability distributions with respect to the distance function, to their topological convergence in a Euclidean space. We also establish that, for finite sample-sizes, the optimal value of the RMDP provides a lower bound on the value of the robust optimal policy in the true MDP, with a high probability. This probabilistic performance guarantee relies on a certain concentration inequality. Under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. These two inequalities hold for several well-known distance functions including total variation, Burg, Hellinger, and Wasserstein. We present computational results on a generic MDP and a machine repair example using total variation, Burg, and Wasserstein distances. These results illustrate that the generality of our framework provides broad choices when attempting to trade-off conservativeness of robust optimal policies against their out-of-sample performance by tuning ambiguity ball radii.In Chapter 3, we extend results from Chapter 2 to a so-called s-rectangular framework. In this more general context, the ambiguity set for the MDP is a Cartesian product of ambiguity sets for individual states. In that chapter, we introduce a family of distance-based s-rectangular RMDPs (s-RMDPs) indexed with ρ ∈ [1,∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across different actions equals a sublevel set of the ℓρ-norm of a vector of distances from reference pmfs. Setting ρ = ∞ in this family recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)-RMDP whose robust optimal value is at least as good; and vice versa. This occurs because the s- and (s, a)-RMDPs can employ different ambiguity set radii, casting a nuanced doubt on the anecdotal belief that (s, a)-RMDPs are more conservative than s-RMDPs. More strongly, if the distance function is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. This suggests that appropriate caution should be exercised before interpreting too broadly any anecdotal claims that (s, a)-RMDPs are more conservative than s-rectangular ones. We also study data-driven versions of our s-RMDPs, where the reference pmf equals the empirical pmf constructed from state transition samples. We prove that the robust optimal values, and the out-of-sample values of robust optimal policies both converge to the true optimal, asymptotically with sample sizes, if the distance function satisfies the generalized Pinsker's inequality introduced in Chapter 2. The robust optimal value also provides a probabilistic lower bound on the out-of-sample value of a robust optimal policy, when the distance function satisfies the concentration inequality. This finite-sample guarantee admits a surprising conclusion - (s, a)-RMDPs are the least conservative among all s-RMDPs within our family. Like in Chapter 2, under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. Though similar asymptotic and finite-sample results were developed for (s, a)-RMDPs in Chapter 2, the proof techniques in this chapter are different and more sophisticated. These more involved proofs are needed because the structure of s-RMDPs introduces new analytical hurdles in our attempt to establish the sufficiency of generalized Pinsker and concentration inequalities. For example, it is no longer adequate to limit attention to deterministic policies - randomization may be needed for optimality. We also present computational experiments on a machine repair example using the total variation distance and ρ = 1. The results of those experiments validate the claims established in that chapter.Finally, in Chapter 4, we develop a data-driven, distance-based RMDP framework on separable complete metric spaces. We extend our asymptotic and finite-sample results to that setup. Unlike our first two studies, this more general endeavor relies on measure-theoretic concepts from minimax control.
- Subject Added Entry-Topical Term
- Industrial engineering.
- Index Term-Uncontrolled
- Robust optimization
- Index Term-Uncontrolled
- Dynamic programming
- Index Term-Uncontrolled
- Probabilistic performance guarantee
- Index Term-Uncontrolled
- Value convergence
- Index Term-Uncontrolled
- Markov decision processes
- Added Entry-Corporate Name
- University of Washington Industrial and Systems Engineering
- Host Item Entry
- Dissertations Abstracts International. 85-03B.
- Host Item Entry
- Dissertation Abstract International
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:640073
MARC
008240219s2023 ulk 00 kor■001000016933349
■00520240214101235
■006m o d
■007cr#unu||||||||
■020 ▼a9798380329958
■035 ▼a(MiAaPQ)AAI30527919
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a658
■1001 ▼aRamani, Sivaramakrishnan.
■24510▼aRobust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets▼h[electronic resource]
■260 ▼a[S.l.]▼bUniversity of Washington. ▼c2023
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2023
■300 ▼a1 online resource(131 p.)
■500 ▼aSource: Dissertations Abstracts International, Volume: 85-03, Section: B.
■500 ▼aAdvisor: Ghate, Archis.
■5021 ▼aThesis (Ph.D.)--University of Washington, 2023.
■506 ▼aThis item must not be sold to any third party vendors.
■520 ▼aWe consider finite- and infinite-horizon Markov decision processes (MDPs), with unknown state-transition probabilities. These transition probabilities are assumed to belong to certain ambiguity sets, and the goal is to maximize the worst-case expected total discounted reward over all probabilities from these sets. Specifically, the ambiguity set is a ball - it includes all probability distributions within a certain distance from the empirical distribution constructed using historical, independent observations of state transitions. We therefore call these problems robust MDPs (RMDPs) with data-driven, distance-based ambiguity sets.The literature on data-driven robust optimization mentions (i) robust value convergence with respect to sample size, (ii) out-of-sample value convergence with respect to sample size, (iii) probabilistic performance guarantees on out-of-sample values, and (iv) a probabilistic convergence rate, as desirable properties. The research objective of this dissertation is to establish essentially a minimal set of conditions under which RMDPs with data-driven, distance-based ambiguity sets exhibit these four properties.We first achieve this for the (s, a)-rectangular RMDPs ((s, a)-RMDPs) studied in Chapter 2. There, the ambiguity set for the whole MDP equals a Cartesian product of ambiguity sets for individual state-action pairs. We establish robust and out-of-sample value convergence under a generalized Pinsker's inequality, if the radii of the ambiguity balls approach zero as the sample-size diverges to infinity. This inequality links convergence of probability distributions with respect to the distance function, to their topological convergence in a Euclidean space. We also establish that, for finite sample-sizes, the optimal value of the RMDP provides a lower bound on the value of the robust optimal policy in the true MDP, with a high probability. This probabilistic performance guarantee relies on a certain concentration inequality. Under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. These two inequalities hold for several well-known distance functions including total variation, Burg, Hellinger, and Wasserstein. We present computational results on a generic MDP and a machine repair example using total variation, Burg, and Wasserstein distances. These results illustrate that the generality of our framework provides broad choices when attempting to trade-off conservativeness of robust optimal policies against their out-of-sample performance by tuning ambiguity ball radii.In Chapter 3, we extend results from Chapter 2 to a so-called s-rectangular framework. In this more general context, the ambiguity set for the MDP is a Cartesian product of ambiguity sets for individual states. In that chapter, we introduce a family of distance-based s-rectangular RMDPs (s-RMDPs) indexed with ρ ∈ [1,∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across different actions equals a sublevel set of the ℓρ-norm of a vector of distances from reference pmfs. Setting ρ = ∞ in this family recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)-RMDP whose robust optimal value is at least as good; and vice versa. This occurs because the s- and (s, a)-RMDPs can employ different ambiguity set radii, casting a nuanced doubt on the anecdotal belief that (s, a)-RMDPs are more conservative than s-RMDPs. More strongly, if the distance function is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. This suggests that appropriate caution should be exercised before interpreting too broadly any anecdotal claims that (s, a)-RMDPs are more conservative than s-rectangular ones. We also study data-driven versions of our s-RMDPs, where the reference pmf equals the empirical pmf constructed from state transition samples. We prove that the robust optimal values, and the out-of-sample values of robust optimal policies both converge to the true optimal, asymptotically with sample sizes, if the distance function satisfies the generalized Pinsker's inequality introduced in Chapter 2. The robust optimal value also provides a probabilistic lower bound on the out-of-sample value of a robust optimal policy, when the distance function satisfies the concentration inequality. This finite-sample guarantee admits a surprising conclusion - (s, a)-RMDPs are the least conservative among all s-RMDPs within our family. Like in Chapter 2, under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. Though similar asymptotic and finite-sample results were developed for (s, a)-RMDPs in Chapter 2, the proof techniques in this chapter are different and more sophisticated. These more involved proofs are needed because the structure of s-RMDPs introduces new analytical hurdles in our attempt to establish the sufficiency of generalized Pinsker and concentration inequalities. For example, it is no longer adequate to limit attention to deterministic policies - randomization may be needed for optimality. We also present computational experiments on a machine repair example using the total variation distance and ρ = 1. The results of those experiments validate the claims established in that chapter.Finally, in Chapter 4, we develop a data-driven, distance-based RMDP framework on separable complete metric spaces. We extend our asymptotic and finite-sample results to that setup. Unlike our first two studies, this more general endeavor relies on measure-theoretic concepts from minimax control.
■590 ▼aSchool code: 0250.
■650 4▼aIndustrial engineering.
■653 ▼aRobust optimization
■653 ▼aDynamic programming
■653 ▼aProbabilistic performance guarantee
■653 ▼aValue convergence
■653 ▼aMarkov decision processes
■690 ▼a0546
■690 ▼a0796
■71020▼aUniversity of Washington▼bIndustrial and Systems Engineering.
■7730 ▼tDissertations Abstracts International▼g85-03B.
■773 ▼tDissertation Abstract International
■790 ▼a0250
■791 ▼aPh.D.
■792 ▼a2023
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16933349▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.
■980 ▼a202402▼f2024