중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.

자료유형: 학위논문

Control Number: 0017162352

International Standard Book Number: 9798383223857

Dewey Decimal Classification Number: 401

Main Entry-Personal Name: Downey, C. M.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Washington., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 132 p.

General Note: Source: Dissertations Abstracts International, Volume: 86-01, Section: B.

General Note: Advisor: Levow, Gina-Anne;Steinert-Threlkeld, Shane.

Dissertation Note: Thesis (Ph.D.)--University of Washington, 2024.

Summary, Etc.: 요약Advances in Natural Language Processing (NLP) over the past decade have largely been driven by the scale of data and computation used to train large neural network-based models. However, these techniques are inapplicable to the vast majority of the world's languages, which lack the vast digitized text datasets available for English and a few other very high-resource languages. In this dissertation, we present three case studies for extending NLP applications to under-resourced languages. These case studies include conducting unsupervised morphological segmentation for extremely low-resource languages via multilingual training and transfer, optimizing the vocabulary of a pre-trained cross-lingual model for specific target language(s), and specializing a pre-trained model for a low-resource language family (Uralic). Based on these case studies, we argue for three broad, guiding principles in extending NLP applications to under-resourced languages. First: where possible, robustly pre-trained models and representations should be leveraged. Second: components of pre-trained models that are not optimized for new languages should be substituted or substantially adapted. Third: targeted multilingual training provides a middle ground between the lack of adequate data to train models for individual under-resourced languages on one hand, and the diminishing returns of "massively multilingual" training on the other.

Subject Added Entry-Topical Term: Linguistics.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Language.

Index Term-Uncontrolled: Natural Language Processing

Index Term-Uncontrolled: Uralic

Index Term-Uncontrolled: Multilinguality

Index Term-Uncontrolled: Vocabulary

Added Entry-Corporate Name: University of Washington Linguistics

Host Item Entry: Dissertations Abstracts International. 86-01B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:655700

008250224s2024        us  ||||||||||||||c||eng  d
■001000017162352
■00520250211152002
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798383223857
■035    ▼a(MiAaPQ)AAI31330087
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a401
■1001  ▼aDowney,  C.  M.
■24510▼aAdapting  Pre-trained  Models  and  Leveraging  Targeted  Multilinguality  for  Under-Resourced  and  Endangered  Language  Processing.
■260    ▼a[S.l.]▼bUniversity  of  Washington.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a132  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  86-01,  Section:  B.
■500    ▼aAdvisor:  Levow,  Gina-Anne;Steinert-Threlkeld,  Shane.
■5021  ▼aThesis  (Ph.D.)--University  of  Washington,  2024.
■520    ▼aAdvances  in  Natural  Language  Processing  (NLP)  over  the  past  decade  have  largely  been  driven  by  the  scale  of  data  and  computation  used  to  train  large  neural  network-based  models.  However,  these  techniques  are  inapplicable  to  the  vast  majority  of  the  world's  languages,  which  lack  the  vast  digitized  text  datasets  available  for  English  and  a  few  other  very  high-resource  languages.  In  this  dissertation,  we  present  three  case  studies  for  extending  NLP  applications  to  under-resourced  languages.  These  case  studies  include  conducting  unsupervised  morphological  segmentation  for  extremely  low-resource  languages  via  multilingual  training  and  transfer,  optimizing  the  vocabulary  of  a  pre-trained  cross-lingual  model  for  specific  target  language(s),  and  specializing  a  pre-trained  model  for  a  low-resource  language  family  (Uralic).  Based  on  these  case  studies,  we  argue  for  three  broad,  guiding  principles  in  extending  NLP  applications  to  under-resourced  languages.  First:  where  possible,  robustly  pre-trained  models  and  representations  should  be  leveraged.  Second:  components  of  pre-trained  models  that  are  not  optimized  for  new  languages  should  be  substituted  or  substantially  adapted.  Third:  targeted  multilingual  training  provides  a  middle  ground  between  the  lack  of  adequate  data  to  train  models  for  individual  under-resourced  languages  on  one  hand,  and  the  diminishing  returns  of  "massively  multilingual"  training  on  the  other.
■590    ▼aSchool  code:  0250.
■650  4▼aLinguistics.
■650  4▼aComputer  science.
■650  4▼aLanguage.
■653    ▼aNatural  Language  Processing
■653    ▼aUralic
■653    ▼aMultilinguality
■653    ▼aVocabulary
■690    ▼a0290
■690    ▼a0984
■690    ▼a0679
■71020▼aUniversity  of  Washington▼bLinguistics.
■7730  ▼tDissertations  Abstracts  International▼g86-01B.
■790    ▼a0250
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162352▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Réservation
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
My Folder

Matériel
Reg No.	Call No.	emplacement	Status	Lend Info
TQ0031722	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Les réservations sont disponibles dans le livre d'emprunt. Pour faire des réservations, S'il vous plaît cliquer sur le bouton de réservation

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Info Détail de la recherche.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK