본문

서브메뉴

Advancing Reinforcement Learning: Multi-Agent Optimization, Opportunistic Exploration, and Causal Interpretation.
Contents Info
Advancing Reinforcement Learning: Multi-Agent Optimization, Opportunistic Exploration, and Causal Interpretation.
자료유형  
 학위논문
Control Number  
0017160317
International Standard Book Number  
9798382604855
Dewey Decimal Classification Number  
896
Main Entry-Personal Name  
Wang, Xiaoxiao.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of California, Davis., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
204 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-11, Section: B.
General Note  
Advisor: Liu, Xin.
Dissertation Note  
Thesis (Ph.D.)--University of California, Davis, 2024.
Summary, Etc.  
요약Reinforcement learning (RL), a critical subfield of machine learning, effectively models sequential decision-making scenarios for agents operating in various environments. Despite its extensive applications, RL encounters significant challenges in real world settings, particularly regarding limited data availability, the exploration-exploitation trade-off, and the lack of explainability. In this dissertation, I explore these issues through three distinct lenses. Firstly, I improve data efficiency in situations involving multiple agents or tasks. Secondly, I propose opportunistic learning algorithms in environments with varying exploration costs. Thirdly, I interpret the agent's learned policy through causal explanations.The following sections outline the contributions. Initially, I study online global optimization in multi-agent situations. Cellular network configuration is a suitable application area experiencing these challenges, including the scarcity of diverse historical data, constrained experimental budgets imposed by network operators, and highly complex and unknown network performance functions. To overcome these challenges, I introduce an online-learning-based joint-optimization algorithm combining neural network regression with Gibbs sampling, which considerably outperforms distributed Q-learning in overall performance and ramp-up time. By leveraging similarities among tasks/base stations, I propose a kernel-based multi-task contextual bandit algorithm with the similarity estimated via conditional kernel embedding. These algorithms notably outperform the default cellular network configuration and the respective baseline algorithms.Next, I focus on opportunistic learning, where the exploration cost in RL varies based on different environmental conditions. Given that exploration cost directly impacts the regret of selecting a sub-optimal action, I design the learning strategy to explore more when the cost is low and exploit when the cost is high. I propose an AdaLinUCB algorithm for opportunistic contextual bandits to balance the exploration-exploitation trade-off adaptively. My algorithm significantly outperforms existing contextual bandit algorithms in scenarios with large exploration cost fluctuations. I further develop two algorithms OppUCRL2 and OppPSRL for the finite-horizon episodic Markov decision process, demonstrating the benefits of opportunistic RL. My algorithms balance the exploration-exploitation trade-off dynamically through a variation factor dependent optimism, leading to superior performance. These results are supported by theoretical regret bound analyses ensuring their performance.Lastly, I aim to enhance RL's interpretability by providing causal explanations. My approach quantifies the causal influence of states on actions and their temporal impact, thereby surpassing associative methods in RL policy explanation. I propose a mechanism to quantify the individual-level causal counterfactual path-specific importance score for a structural causal model. This mechanism effectively evaluates causal influence in decision chains, allowing us to comprehend better how a specific decision variable influences an outcome variable.
Subject Added Entry-Topical Term  
African literature.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Electrical engineering.
Index Term-Uncontrolled  
Causal explanation
Index Term-Uncontrolled  
Contextual bandit
Index Term-Uncontrolled  
Multi-task learning
Index Term-Uncontrolled  
Opportunistic learning
Index Term-Uncontrolled  
Reinforcement learning
Added Entry-Corporate Name  
University of California, Davis Computer Science
Host Item Entry  
Dissertations Abstracts International. 85-11B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:654614
New Books MORE
최근 3년간 통계입니다.

פרט מידע

  • הזמנה
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • התיקיה שלי
גשמי
Reg No. Call No. מיקום מצב להשאיל מידע
TQ0030536 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치