본문

서브메뉴

Reconsider Machine Learning Method for Variable Selection and Validation With High Dimensional Data.
Reconsider Machine Learning Method for Variable Selection and Validation With High Dimensional Data.

상세정보

자료유형  
 학위논문
Control Number  
0017162676
International Standard Book Number  
9798384093374
Dewey Decimal Classification Number  
574
Main Entry-Personal Name  
Liu, Lu.
Publication, Distribution, etc. (Imprint  
[S.l.] : Duke University., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
89 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-03, Section: A.
General Note  
Advisor: Jung, Sin-Ho.
Dissertation Note  
Thesis (Ph.D.)--Duke University, 2024.
Summary, Etc.  
요약The big data tendency influences how people think and inspires potential research directions. Recent feats of machine learning have seized collective attention because of its profound performance in conducting big data analysis including text analysis and image processing. Machine learning is also a popular topic in clinical medicine to implement analysis on electronic health records and medical image data, which traditional statistics model is not adequate for. However, we realize that machine learning is not panacea and its defects such as loss of interpretability and excess selection may restrict its application. And we must also recognize that for many clinical prediction analyses, the simpler approach-generalized linear model is enough for what we need. In this dissertation, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome the over-selection issue of popular machine learning methods. For model validation, we propose a permutation approach to estimate the performance of various validation methods. Finally, we propose a repeated sieving approach, extending the standard regression methods with stepwise variable selection, to handle high dimensional modeling.
Subject Added Entry-Topical Term  
Biostatistics.
Subject Added Entry-Topical Term  
Statistics.
Subject Added Entry-Topical Term  
Bioinformatics.
Subject Added Entry-Topical Term  
Information science.
Index Term-Uncontrolled  
Logistic regression
Index Term-Uncontrolled  
Machine learning
Index Term-Uncontrolled  
Permutation approach
Index Term-Uncontrolled  
Variable selection
Index Term-Uncontrolled  
Validation methods
Added Entry-Corporate Name  
Duke University Biostatistics and Bioinformatics Doctor of Philosophy
Host Item Entry  
Dissertations Abstracts International. 86-03A.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:658171

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017162676
■00520250211152040
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798384093374
■035    ▼a(MiAaPQ)AAI31336592
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a574
■1001  ▼aLiu,  Lu.
■24510▼aReconsider  Machine  Learning  Method  for  Variable  Selection  and  Validation  With  High  Dimensional  Data.
■260    ▼a[S.l.]▼bDuke  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a89  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  86-03,  Section:  A.
■500    ▼aAdvisor:  Jung,  Sin-Ho.
■5021  ▼aThesis  (Ph.D.)--Duke  University,  2024.
■520    ▼aThe  big  data  tendency  influences  how  people  think  and  inspires  potential  research  directions.  Recent  feats  of  machine  learning  have  seized  collective  attention  because  of  its  profound  performance  in  conducting  big  data  analysis  including  text  analysis  and  image  processing.  Machine  learning  is  also  a  popular  topic  in  clinical  medicine  to  implement  analysis  on  electronic  health  records  and  medical  image  data,  which  traditional  statistics  model  is  not  adequate  for.  However,  we  realize  that  machine  learning  is  not  panacea  and  its  defects  such  as  loss  of  interpretability  and  excess  selection  may  restrict  its  application.  And  we  must  also  recognize  that  for  many  clinical  prediction  analyses,  the  simpler  approach-generalized  linear  model  is  enough  for  what  we  need.  In  this  dissertation,  we  propose  to  use  standard  regression  methods,  without  any  penalizing  approach,  combined  with  a  stepwise  variable  selection  procedure  to  overcome  the  over-selection  issue  of  popular  machine  learning  methods.  For  model  validation,  we  propose  a  permutation  approach  to  estimate  the  performance  of  various  validation  methods.  Finally,  we  propose  a  repeated  sieving  approach,  extending  the  standard  regression  methods  with  stepwise  variable  selection,  to  handle  high  dimensional  modeling.
■590    ▼aSchool  code:  0066.
■650  4▼aBiostatistics.
■650  4▼aStatistics.
■650  4▼aBioinformatics.
■650  4▼aInformation  science.
■653    ▼aLogistic  regression
■653    ▼aMachine  learning
■653    ▼aPermutation  approach
■653    ▼aVariable  selection
■653    ▼aValidation  methods
■690    ▼a0308
■690    ▼a0723
■690    ▼a0715
■690    ▼a0463
■71020▼aDuke  University▼bBiostatistics  and  Bioinformatics  Doctor  of  Philosophy.
■7730  ▼tDissertations  Abstracts  International▼g86-03A.
■790    ▼a0066
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162676▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    高级搜索信息

    • 预订
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • 我的文件夹
    材料
    注册编号 呼叫号码. 收藏 状态 借信息.
    TQ0034489 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    *保留在借用的书可用。预订,请点击预订按钮

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치