본문

서브메뉴

Theory and Algorithms for Data-Centric Machine Learning- [electronic resource]
コンテンツ情報
Theory and Algorithms for Data-Centric Machine Learning- [electronic resource]
자료유형  
 학위논문
Control Number  
0016934894
International Standard Book Number  
9798380482981
Dewey Decimal Classification Number  
600
Main Entry-Personal Name  
Izzo, Zachary Luigi Edward.
Publication, Distribution, etc. (Imprint  
[S.l.] : Stanford University., 2023
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2023
Physical Description  
1 online resource(210 p.)
General Note  
Source: Dissertations Abstracts International, Volume: 85-04, Section: B.
General Note  
Advisor: Zou, James;Ying, Lexing.
Dissertation Note  
Thesis (Ph.D.)--Stanford University, 2023.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약Machine learning (ML) and AI have achieved remarkable, super-human performance in a wide variety of domains: computer vision, natural language processing, and protein folding, to name but a few. Until recently, most advancements have taken a model-centric approach, focusing primarily on improved neural network architectures (ConvNets, ResNets, transformers, etc.) and optimization procedures for training these models (batch norm, dropout, neural architecture search, etc.). Relatively less attention has been paid to the data used to train these models, in spite of the well-known fact that ML is critically dependent on high-quality data, captured succinctly with the phrase "garbage in, garbage out." As the returns on ever larger and more complicated models diminish (MT-NLG from Nvidia and Microsoft having 530B parameters), researchers have begun to realize the importance of taking a data-centric approach and developing principled methods for studying the fuel for these models: the data itself. Beyond improved task performance, a data-centric perspective also allows us to take socially critical considerations, such as data privacy, into account.In this thesis, we will take a critical look at several points in the ML data pipeline: before, during, and after model training. Before model training, we will explore the problem of data selection: which data should be used to train the model, and on what type of data should we expect our model to work? As we move forward into model training, we will turn our attention to two issues which can result from the interaction of our ML systems with the environment in which they are deployed. The first issue is that of data privacy: how can we prevent our models from leaking sensitive information about their training data? The second issue concerns the dynamic nature of some modeled populations. Especially when our model is used to make socially impactful decisions (e.g., automated loan approval or recommender systems), the model itself may impact the distribution of the data, leading to degraded performance. Lastly, despite following best practices before and during model training, it may be the case that we want to post-process a model to remove the effects of certain data after training. How can this be achieved in a computationally efficient manner?This thesis covers novel solutions for each of the preceding problems, with an emphasis on the provable guarantees for each of the proposed algorithms. By applying mathematical rigor to challenging real-world problems, we can develop algorithms which are both effective and trustworthy.
Added Entry-Corporate Name  
Stanford University.
Host Item Entry  
Dissertations Abstracts International. 85-04B.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:641416
New Books MORE
최근 3년간 통계입니다.

詳細情報

  • 予約
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • 私のフォルダ
資料
登録番号 請求記号 場所 ステータス 情報を貸す
TQ0027330 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

*ご予約は、借入帳でご利用いただけます。予約をするには、予約ボタンをクリックしてください

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치