중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Advancing Reinforcement Learning: Multi-Agent Optimization, Opportunistic Exploration, and Causal Interpretation.

자료유형: 학위논문

Control Number: 0017160317

International Standard Book Number: 9798382604855

Dewey Decimal Classification Number: 896

Main Entry-Personal Name: Wang, Xiaoxiao.

Publication, Distribution, etc. (Imprint: [S.l.] : University of California, Davis., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 204 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-11, Section: B.

General Note: Advisor: Liu, Xin.

Dissertation Note: Thesis (Ph.D.)--University of California, Davis, 2024.

Summary, Etc.: 요약Reinforcement learning (RL), a critical subfield of machine learning, effectively models sequential decision-making scenarios for agents operating in various environments. Despite its extensive applications, RL encounters significant challenges in real world settings, particularly regarding limited data availability, the exploration-exploitation trade-off, and the lack of explainability. In this dissertation, I explore these issues through three distinct lenses. Firstly, I improve data efficiency in situations involving multiple agents or tasks. Secondly, I propose opportunistic learning algorithms in environments with varying exploration costs. Thirdly, I interpret the agent's learned policy through causal explanations.The following sections outline the contributions. Initially, I study online global optimization in multi-agent situations. Cellular network configuration is a suitable application area experiencing these challenges, including the scarcity of diverse historical data, constrained experimental budgets imposed by network operators, and highly complex and unknown network performance functions. To overcome these challenges, I introduce an online-learning-based joint-optimization algorithm combining neural network regression with Gibbs sampling, which considerably outperforms distributed Q-learning in overall performance and ramp-up time. By leveraging similarities among tasks/base stations, I propose a kernel-based multi-task contextual bandit algorithm with the similarity estimated via conditional kernel embedding. These algorithms notably outperform the default cellular network configuration and the respective baseline algorithms.Next, I focus on opportunistic learning, where the exploration cost in RL varies based on different environmental conditions. Given that exploration cost directly impacts the regret of selecting a sub-optimal action, I design the learning strategy to explore more when the cost is low and exploit when the cost is high. I propose an AdaLinUCB algorithm for opportunistic contextual bandits to balance the exploration-exploitation trade-off adaptively. My algorithm significantly outperforms existing contextual bandit algorithms in scenarios with large exploration cost fluctuations. I further develop two algorithms OppUCRL2 and OppPSRL for the finite-horizon episodic Markov decision process, demonstrating the benefits of opportunistic RL. My algorithms balance the exploration-exploitation trade-off dynamically through a variation factor dependent optimism, leading to superior performance. These results are supported by theoretical regret bound analyses ensuring their performance.Lastly, I aim to enhance RL's interpretability by providing causal explanations. My approach quantifies the causal influence of states on actions and their temporal impact, thereby surpassing associative methods in RL policy explanation. I propose a mechanism to quantify the individual-level causal counterfactual path-specific importance score for a structural causal model. This mechanism effectively evaluates causal influence in decision chains, allowing us to comprehend better how a specific decision variable influences an outcome variable.

Subject Added Entry-Topical Term: African literature.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Electrical engineering.

Index Term-Uncontrolled: Causal explanation

Index Term-Uncontrolled: Contextual bandit

Index Term-Uncontrolled: Multi-task learning

Index Term-Uncontrolled: Opportunistic learning

Index Term-Uncontrolled: Reinforcement learning

Added Entry-Corporate Name: University of California, Davis Computer Science

Host Item Entry: Dissertations Abstracts International. 85-11B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:654614

New Books MORE

최근 3년간 통계입니다.

הזמנה
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
התיקיה שלי

גשמי
Reg No.	Call No.	מיקום	מצב	להשאיל מידע
TQ0030536	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

פרט מידע

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK