중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

コンテンツ情報

Theory and Algorithms for Data-Centric Machine Learning- [electronic resource]

자료유형: 학위논문

Control Number: 0016934894

International Standard Book Number: 9798380482981

Dewey Decimal Classification Number: 600

Main Entry-Personal Name: Izzo, Zachary Luigi Edward.

Publication, Distribution, etc. (Imprint: [S.l.] : Stanford University., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(210 p.)

General Note: Source: Dissertations Abstracts International, Volume: 85-04, Section: B.

General Note: Advisor: Zou, James;Ying, Lexing.

Dissertation Note: Thesis (Ph.D.)--Stanford University, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Summary, Etc.: 요약Machine learning (ML) and AI have achieved remarkable, super-human performance in a wide variety of domains: computer vision, natural language processing, and protein folding, to name but a few. Until recently, most advancements have taken a model-centric approach, focusing primarily on improved neural network architectures (ConvNets, ResNets, transformers, etc.) and optimization procedures for training these models (batch norm, dropout, neural architecture search, etc.). Relatively less attention has been paid to the data used to train these models, in spite of the well-known fact that ML is critically dependent on high-quality data, captured succinctly with the phrase "garbage in, garbage out." As the returns on ever larger and more complicated models diminish (MT-NLG from Nvidia and Microsoft having 530B parameters), researchers have begun to realize the importance of taking a data-centric approach and developing principled methods for studying the fuel for these models: the data itself. Beyond improved task performance, a data-centric perspective also allows us to take socially critical considerations, such as data privacy, into account.In this thesis, we will take a critical look at several points in the ML data pipeline: before, during, and after model training. Before model training, we will explore the problem of data selection: which data should be used to train the model, and on what type of data should we expect our model to work? As we move forward into model training, we will turn our attention to two issues which can result from the interaction of our ML systems with the environment in which they are deployed. The first issue is that of data privacy: how can we prevent our models from leaking sensitive information about their training data? The second issue concerns the dynamic nature of some modeled populations. Especially when our model is used to make socially impactful decisions (e.g., automated loan approval or recommender systems), the model itself may impact the distribution of the data, leading to degraded performance. Lastly, despite following best practices before and during model training, it may be the case that we want to post-process a model to remove the effects of certain data after training. How can this be achieved in a computationally efficient manner?This thesis covers novel solutions for each of the preceding problems, with an emphasis on the provable guarantees for each of the proposed algorithms. By applying mathematical rigor to challenging real-world problems, we can develop algorithms which are both effective and trustworthy.

Added Entry-Corporate Name: Stanford University.

Host Item Entry: Dissertations Abstracts International. 85-04B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:641416

New Books MORE

최근 3년간 통계입니다.

予約
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
私のフォルダ

資料
登録番号	請求記号	場所	ステータス	情報を貸す
TQ0027330	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

*ご予約は、借入帳でご利用いただけます。予約をするには、予約ボタンをクリックしてください

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

詳細情報

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK