본문

서브메뉴

Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.
Contents Info
Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.
자료유형  
 학위논문
Control Number  
0017162352
International Standard Book Number  
9798383223857
Dewey Decimal Classification Number  
401
Main Entry-Personal Name  
Downey, C. M.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of Washington., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
132 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
General Note  
Advisor: Levow, Gina-Anne;Steinert-Threlkeld, Shane.
Dissertation Note  
Thesis (Ph.D.)--University of Washington, 2024.
Summary, Etc.  
요약Advances in Natural Language Processing (NLP) over the past decade have largely been driven by the scale of data and computation used to train large neural network-based models. However, these techniques are inapplicable to the vast majority of the world's languages, which lack the vast digitized text datasets available for English and a few other very high-resource languages. In this dissertation, we present three case studies for extending NLP applications to under-resourced languages. These case studies include conducting unsupervised morphological segmentation for extremely low-resource languages via multilingual training and transfer, optimizing the vocabulary of a pre-trained cross-lingual model for specific target language(s), and specializing a pre-trained model for a low-resource language family (Uralic). Based on these case studies, we argue for three broad, guiding principles in extending NLP applications to under-resourced languages. First: where possible, robustly pre-trained models and representations should be leveraged. Second: components of pre-trained models that are not optimized for new languages should be substituted or substantially adapted. Third: targeted multilingual training provides a middle ground between the lack of adequate data to train models for individual under-resourced languages on one hand, and the diminishing returns of "massively multilingual" training on the other.
Subject Added Entry-Topical Term  
Linguistics.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Language.
Index Term-Uncontrolled  
Natural Language Processing
Index Term-Uncontrolled  
Uralic
Index Term-Uncontrolled  
Multilinguality
Index Term-Uncontrolled  
Vocabulary
Added Entry-Corporate Name  
University of Washington Linguistics
Host Item Entry  
Dissertations Abstracts International. 86-01B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:655700
New Books MORE
최근 3년간 통계입니다.

detalle info

  • Reserva
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • Mi carpeta
Material
número de libro número de llamada Ubicación estado Prestar info
TQ0031722 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* Las reservas están disponibles en el libro de préstamos. Para hacer reservaciones, haga clic en el botón de reserva

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치