서브메뉴
검색
Adversarial Robustness for Estimation and Alignment.
Adversarial Robustness for Estimation and Alignment.
- Material Type
- 학위논문
- 0017161278
- Date and Time of Latest Transaction
- 20250211151333
- ISBN
- 9798382830964
- DDC
- 310
- Author
- Chao, Patrick.
- Title/Author
- Adversarial Robustness for Estimation and Alignment.
- Publish Info
- [S.l.] : University of Pennsylvania., 2024
- Publish Info
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Material Info
- 216 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
- General Note
- Advisor: Dobriban, Edgar.
- 학위논문주기
- Thesis (Ph.D.)--University of Pennsylvania, 2024.
- Abstracts/Etc
- 요약As machine learning models are deployed in a multitude of settings with increasing levels of influence and competency, there is growing interest in ensuring these models are robust and align with human intentions. To this end, we analyze robust models and adversarial inputs in a variety of settings. We explore statistical estimation under the adversarial setting of Wasserstein distribution shifts, where every data point may undergo a bounded perturbation. We analyze several statistical problems, including location estimation, linear regression, and non-parametric density estimation. Furthermore, we evaluate alignment in modern foundation models, and propose automated methods to construct adversarial inputs. We develop black-box automated algorithms to generate adversarial prompts for text-to-image models and jailbreaks for language models. Lastly, we introduce a benchmark, JailbreakBench, for reproducible jailbreak evaluation.
- Subject Added Entry-Topical Term
- Statistics.
- Subject Added Entry-Topical Term
- Information technology.
- Index Term-Uncontrolled
- Adversarial prompts
- Index Term-Uncontrolled
- Adversarial robustness
- Index Term-Uncontrolled
- Distribution shifts
- Index Term-Uncontrolled
- Jailbreaking
- Index Term-Uncontrolled
- Minimax estimation
- Index Term-Uncontrolled
- Red teaming
- Added Entry-Corporate Name
- University of Pennsylvania Statistics and Data Science
- Host Item Entry
- Dissertations Abstracts International. 85-12B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:658472
Detail Info.
- Reservation
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- My Folder