서브메뉴
검색
Reconsider Machine Learning Method for Variable Selection and Validation With High Dimensional Data.
Reconsider Machine Learning Method for Variable Selection and Validation With High Dimensional Data.
- 자료유형
- 학위논문
- Control Number
- 0017162676
- International Standard Book Number
- 9798384093374
- Dewey Decimal Classification Number
- 574
- Main Entry-Personal Name
- Liu, Lu.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Duke University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 89 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-03, Section: A.
- General Note
- Advisor: Jung, Sin-Ho.
- Dissertation Note
- Thesis (Ph.D.)--Duke University, 2024.
- Summary, Etc.
- 요약The big data tendency influences how people think and inspires potential research directions. Recent feats of machine learning have seized collective attention because of its profound performance in conducting big data analysis including text analysis and image processing. Machine learning is also a popular topic in clinical medicine to implement analysis on electronic health records and medical image data, which traditional statistics model is not adequate for. However, we realize that machine learning is not panacea and its defects such as loss of interpretability and excess selection may restrict its application. And we must also recognize that for many clinical prediction analyses, the simpler approach-generalized linear model is enough for what we need. In this dissertation, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome the over-selection issue of popular machine learning methods. For model validation, we propose a permutation approach to estimate the performance of various validation methods. Finally, we propose a repeated sieving approach, extending the standard regression methods with stepwise variable selection, to handle high dimensional modeling.
- Subject Added Entry-Topical Term
- Biostatistics.
- Subject Added Entry-Topical Term
- Statistics.
- Subject Added Entry-Topical Term
- Bioinformatics.
- Subject Added Entry-Topical Term
- Information science.
- Index Term-Uncontrolled
- Logistic regression
- Index Term-Uncontrolled
- Machine learning
- Index Term-Uncontrolled
- Permutation approach
- Index Term-Uncontrolled
- Variable selection
- Index Term-Uncontrolled
- Validation methods
- Added Entry-Corporate Name
- Duke University Biostatistics and Bioinformatics Doctor of Philosophy
- Host Item Entry
- Dissertations Abstracts International. 86-03A.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:658171