본문

서브메뉴

Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.
Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.

상세정보

자료유형  
 학위논문
Control Number  
0017162352
International Standard Book Number  
9798383223857
Dewey Decimal Classification Number  
401
Main Entry-Personal Name  
Downey, C. M.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of Washington., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
132 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
General Note  
Advisor: Levow, Gina-Anne;Steinert-Threlkeld, Shane.
Dissertation Note  
Thesis (Ph.D.)--University of Washington, 2024.
Summary, Etc.  
요약Advances in Natural Language Processing (NLP) over the past decade have largely been driven by the scale of data and computation used to train large neural network-based models. However, these techniques are inapplicable to the vast majority of the world's languages, which lack the vast digitized text datasets available for English and a few other very high-resource languages. In this dissertation, we present three case studies for extending NLP applications to under-resourced languages. These case studies include conducting unsupervised morphological segmentation for extremely low-resource languages via multilingual training and transfer, optimizing the vocabulary of a pre-trained cross-lingual model for specific target language(s), and specializing a pre-trained model for a low-resource language family (Uralic). Based on these case studies, we argue for three broad, guiding principles in extending NLP applications to under-resourced languages. First: where possible, robustly pre-trained models and representations should be leveraged. Second: components of pre-trained models that are not optimized for new languages should be substituted or substantially adapted. Third: targeted multilingual training provides a middle ground between the lack of adequate data to train models for individual under-resourced languages on one hand, and the diminishing returns of "massively multilingual" training on the other.
Subject Added Entry-Topical Term  
Linguistics.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Language.
Index Term-Uncontrolled  
Natural Language Processing
Index Term-Uncontrolled  
Uralic
Index Term-Uncontrolled  
Multilinguality
Index Term-Uncontrolled  
Vocabulary
Added Entry-Corporate Name  
University of Washington Linguistics
Host Item Entry  
Dissertations Abstracts International. 86-01B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:655700

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017162352
■00520250211152002
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798383223857
■035    ▼a(MiAaPQ)AAI31330087
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a401
■1001  ▼aDowney,  C.  M.
■24510▼aAdapting  Pre-trained  Models  and  Leveraging  Targeted  Multilinguality  for  Under-Resourced  and  Endangered  Language  Processing.
■260    ▼a[S.l.]▼bUniversity  of  Washington.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a132  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  86-01,  Section:  B.
■500    ▼aAdvisor:  Levow,  Gina-Anne;Steinert-Threlkeld,  Shane.
■5021  ▼aThesis  (Ph.D.)--University  of  Washington,  2024.
■520    ▼aAdvances  in  Natural  Language  Processing  (NLP)  over  the  past  decade  have  largely  been  driven  by  the  scale  of  data  and  computation  used  to  train  large  neural  network-based  models.  However,  these  techniques  are  inapplicable  to  the  vast  majority  of  the  world's  languages,  which  lack  the  vast  digitized  text  datasets  available  for  English  and  a  few  other  very  high-resource  languages.  In  this  dissertation,  we  present  three  case  studies  for  extending  NLP  applications  to  under-resourced  languages.  These  case  studies  include  conducting  unsupervised  morphological  segmentation  for  extremely  low-resource  languages  via  multilingual  training  and  transfer,  optimizing  the  vocabulary  of  a  pre-trained  cross-lingual  model  for  specific  target  language(s),  and  specializing  a  pre-trained  model  for  a  low-resource  language  family  (Uralic).  Based  on  these  case  studies,  we  argue  for  three  broad,  guiding  principles  in  extending  NLP  applications  to  under-resourced  languages.  First:  where  possible,  robustly  pre-trained  models  and  representations  should  be  leveraged.  Second:  components  of  pre-trained  models  that  are  not  optimized  for  new  languages  should  be  substituted  or  substantially  adapted.  Third:  targeted  multilingual  training  provides  a  middle  ground  between  the  lack  of  adequate  data  to  train  models  for  individual  under-resourced  languages  on  one  hand,  and  the  diminishing  returns  of  "massively  multilingual"  training  on  the  other.
■590    ▼aSchool  code:  0250.
■650  4▼aLinguistics.
■650  4▼aComputer  science.
■650  4▼aLanguage.
■653    ▼aNatural  Language  Processing
■653    ▼aUralic
■653    ▼aMultilinguality
■653    ▼aVocabulary
■690    ▼a0290
■690    ▼a0984
■690    ▼a0679
■71020▼aUniversity  of  Washington▼bLinguistics.
■7730  ▼tDissertations  Abstracts  International▼g86-01B.
■790    ▼a0250
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162352▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    Info Détail de la recherche.

    • Réservation
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • My Folder
    Matériel
    Reg No. Call No. emplacement Status Lend Info
    TQ0031722 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * Les réservations sont disponibles dans le livre d'emprunt. Pour faire des réservations, S'il vous plaît cliquer sur le bouton de réservation

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치