중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Understanding the Role of Data in Model Decisions.

자료유형: 학위논문

Control Number: 0017160266

International Standard Book Number: 9798382191348

Dewey Decimal Classification Number: 401

Main Entry-Personal Name: Gupta, Arushi.

Publication, Distribution, etc. (Imprint: [S.l.] : Princeton University., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 154 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-10, Section: A.

General Note: Advisor: Arora, Sanjeev.

Dissertation Note: Thesis (Ph.D.)--Princeton University, 2024.

Summary, Etc.: 요약As neural networks are increasingly employed in high stakes applications such as criminal justice, medicine, etc, it becomes increasingly important to understand why these models make the decisions they do. For example, it is important to develop tools to analyze whether models are perpetuating harmful demographic inequalities they have found in their training data in their future decision making. However, neural networks typically require large training sets, have "black-box" decision making, and have costly retraining protocols, increasing the difficulty of this problem. This work considers three questions. Q1) What is the relationship between the elements of an input and the model's decision? Q2) What is the relationship between the individual training points and the model's decision. And finally Q3) To what extent do there exist (efficient) approximations that would allow practitioners to predict how model performance would change given different training data, or a different training protocol.Part I addresses Q1 for masking saliency methods. These methods implicitly assume that grey pixels in an image are "uninformative." We find experimentally that this assumption may not always be true, and define "soundness," which measures a desirable property of a saliency map. Part II addresses Q2 and Q3 in the context of influence functions, which aim to approximate the effect of removing a training points on the model's decision. We use harmonic analysis to examine a particular type of influence method, namely datamodels, and find that there is a relationship between the coefficients of the datamodel, and the Fourier coefficients of the target function. Finally, Part III addresses Q3 in the context of test data. First, we assess whether held out test data is necessary to approximate the outer loop of meta learning, or whether recycling training data constitutes a sufficient approximation. We find that held out test data is important, as it learns representations that are low rank. Then, inspired by the PGDL competition we investigate whether GAN generated data, despite well known limitations, can be used to approximate generalization performance when no test or validation set is available, and find that they can.

Subject Added Entry-Topical Term: Linguistics.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Information technology.

Index Term-Uncontrolled: Neural networks

Index Term-Uncontrolled: Decision making

Index Term-Uncontrolled: Training data

Index Term-Uncontrolled: Meta learning

Index Term-Uncontrolled: Datamodels

Added Entry-Corporate Name: Princeton University Computer Science

Host Item Entry: Dissertations Abstracts International. 85-10A.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:654798

New Books MORE

최근 3년간 통계입니다.

Reserva
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
Mi carpeta

Material
número de libro	número de llamada	Ubicación	estado	Prestar info
TQ0030720	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Las reservas están disponibles en el libro de préstamos. Para hacer reservaciones, haga clic en el botón de reserva

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

detalle info

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK