본문

서브메뉴

Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.
Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced an...
Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.

상세정보

Material Type  
 학위논문
 
0017162352
Date and Time of Latest Transaction  
20250211152002
ISBN  
9798383223857
DDC  
401
Author  
Downey, C. M.
Title/Author  
Adapting Pre-trained Models and Leveraging Targeted Multilinguality for Under-Resourced and Endangered Language Processing.
Publish Info  
[S.l.] : University of Washington., 2024
Publish Info  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Material Info  
132 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
General Note  
Advisor: Levow, Gina-Anne;Steinert-Threlkeld, Shane.
학위논문주기  
Thesis (Ph.D.)--University of Washington, 2024.
Abstracts/Etc  
요약Advances in Natural Language Processing (NLP) over the past decade have largely been driven by the scale of data and computation used to train large neural network-based models. However, these techniques are inapplicable to the vast majority of the world's languages, which lack the vast digitized text datasets available for English and a few other very high-resource languages. In this dissertation, we present three case studies for extending NLP applications to under-resourced languages. These case studies include conducting unsupervised morphological segmentation for extremely low-resource languages via multilingual training and transfer, optimizing the vocabulary of a pre-trained cross-lingual model for specific target language(s), and specializing a pre-trained model for a low-resource language family (Uralic). Based on these case studies, we argue for three broad, guiding principles in extending NLP applications to under-resourced languages. First: where possible, robustly pre-trained models and representations should be leveraged. Second: components of pre-trained models that are not optimized for new languages should be substituted or substantially adapted. Third: targeted multilingual training provides a middle ground between the lack of adequate data to train models for individual under-resourced languages on one hand, and the diminishing returns of "massively multilingual" training on the other.
Subject Added Entry-Topical Term  
Linguistics.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Language.
Index Term-Uncontrolled  
Natural Language Processing
Index Term-Uncontrolled  
Uralic
Index Term-Uncontrolled  
Multilinguality
Index Term-Uncontrolled  
Vocabulary
Added Entry-Corporate Name  
University of Washington Linguistics
Host Item Entry  
Dissertations Abstracts International. 86-01B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:655700

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017162352
■00520250211152002
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798383223857
■035    ▼a(MiAaPQ)AAI31330087
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a401
■1001  ▼aDowney,  C.  M.
■24510▼aAdapting  Pre-trained  Models  and  Leveraging  Targeted  Multilinguality  for  Under-Resourced  and  Endangered  Language  Processing.
■260    ▼a[S.l.]▼bUniversity  of  Washington.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a132  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  86-01,  Section:  B.
■500    ▼aAdvisor:  Levow,  Gina-Anne;Steinert-Threlkeld,  Shane.
■5021  ▼aThesis  (Ph.D.)--University  of  Washington,  2024.
■520    ▼aAdvances  in  Natural  Language  Processing  (NLP)  over  the  past  decade  have  largely  been  driven  by  the  scale  of  data  and  computation  used  to  train  large  neural  network-based  models.  However,  these  techniques  are  inapplicable  to  the  vast  majority  of  the  world's  languages,  which  lack  the  vast  digitized  text  datasets  available  for  English  and  a  few  other  very  high-resource  languages.  In  this  dissertation,  we  present  three  case  studies  for  extending  NLP  applications  to  under-resourced  languages.  These  case  studies  include  conducting  unsupervised  morphological  segmentation  for  extremely  low-resource  languages  via  multilingual  training  and  transfer,  optimizing  the  vocabulary  of  a  pre-trained  cross-lingual  model  for  specific  target  language(s),  and  specializing  a  pre-trained  model  for  a  low-resource  language  family  (Uralic).  Based  on  these  case  studies,  we  argue  for  three  broad,  guiding  principles  in  extending  NLP  applications  to  under-resourced  languages.  First:  where  possible,  robustly  pre-trained  models  and  representations  should  be  leveraged.  Second:  components  of  pre-trained  models  that  are  not  optimized  for  new  languages  should  be  substituted  or  substantially  adapted.  Third:  targeted  multilingual  training  provides  a  middle  ground  between  the  lack  of  adequate  data  to  train  models  for  individual  under-resourced  languages  on  one  hand,  and  the  diminishing  returns  of  "massively  multilingual"  training  on  the  other.
■590    ▼aSchool  code:  0250.
■650  4▼aLinguistics.
■650  4▼aComputer  science.
■650  4▼aLanguage.
■653    ▼aNatural  Language  Processing
■653    ▼aUralic
■653    ▼aMultilinguality
■653    ▼aVocabulary
■690    ▼a0290
■690    ▼a0984
■690    ▼a0679
■71020▼aUniversity  of  Washington▼bLinguistics.
■7730  ▼tDissertations  Abstracts  International▼g86-01B.
■790    ▼a0250
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162352▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    Detail Info.

    • Reservation
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • My Folder
    Material
    Reg No. Call No. Location Status Lend Info
    TQ0031722 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * Reservations are available in the borrowing book. To make reservations, Please click the reservation button

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치