중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Statistical and Computational Approaches for Data Integration and Constrained Variable Selection in Large Datasets- [electronic resource]

자료유형: 학위논문

Control Number: 0016935586

International Standard Book Number: 9798380371766

Dewey Decimal Classification Number: 574

Main Entry-Personal Name: Tran, Lam.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Michigan., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(129 p.)

General Note: Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

General Note: Advisor: Jiang, Hui.

Dissertation Note: Thesis (Ph.D.)--University of Michigan, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Restrictions on Access Note: This item must not be added to any third party search indexes.

Summary, Etc.: 요약With the number of covariates, sample size, and heterogeneity in datasets continuously increasing, the incorporation of prior domain knowledge or the addition of structural constraints in a model represents an attractive means to perform informed variable selection on high numbers of potential predictors. The growing complexity of individual datasets has been accompanied by their increasing availability, as researchers nowadays can access ever-expanding biobanks and other large clinical datasets. Integration of external datasets can increase the generalizability of locally-gathered data, but these datasets can be affected by context-specific confounders, necessitating weighted integration methods to differentiate datasets of variable quality.In Chapter 2, we present a method to perform weighted data integration based on minimizing the local data leave-one-out cross-validation (LOOCV) error, under the assumption that the local data is generated from the set of unknown true parameters. We demonstrate how the optimization of the LOOCV error for various models can be written as functions of external dataset weights. Furthermore, we develop an accompanying reduced space approach that reduces the weighted integration of any number of external datasets to a two-parameter optimization. The utility of the weighted data integration method in comparison to existing methods is shown through extensive simulation work mimicking heterogeneous clinical data, as well as in two real-world examples. The first examines kidney transplant patients from the Scientific Registry of Transplant Recipients and the second looks at the genomic data of bladder cancer patients from The Cancer Genome Atlas. Ongoing work on calculating standard error estimates and developing significance testing under a false discovery rate framework is also presented.In Chapter 3, we devise a fast solution to the equality-constrained lasso problem with a two-stage algorithm: first obtaining candidate covariates subsets of increasing size from unconstrained lasso problems and then leveraging an efficient alternating direction method of multipliers (ADMM) algorithm. Our "candidate subset approach" produces the same solution path as solving the constrained lasso over the entire predictor space, and in simulation studies, our approach is over an order of magnitude faster than existing methods. The ability to solve the equality-constrained lasso with multiple constraints and with a large number of potential predictors is demonstrated in a microbiome regression analysis and a myeloma survival analysis, neither of which could be solved by naively fitting the constrained lasso on all predictors.In Chapter 4, we aim to extend the candidate subset approach for constrained variable selection to accommodate different penalty functions and inequality constraints. Despite its desirable selection properties, it is well-known that the lasso is biased for large regression coefficients; to address this shortcoming, we consider our approach with two non-convex penalty functions, SCAD and MCP. Furthermore, we also consider the approach with inequality constraints and dual equality/inequality constraints, which greatly increases the number of potential applications. We demonstrate that the properties of the candidate subset approach, in terms of its speed and producing the same solution over the whole predictor space, additionally hold for these two extensions.

Subject Added Entry-Topical Term: Biostatistics.

Subject Added Entry-Topical Term: Bioinformatics.

Subject Added Entry-Topical Term: Information technology.

Index Term-Uncontrolled: Data integration

Index Term-Uncontrolled: Variable selection

Index Term-Uncontrolled: Computational approaches

Index Term-Uncontrolled: External datasets

Index Term-Uncontrolled: Genomic data

Added Entry-Corporate Name: University of Michigan Biostatistics

Host Item Entry: Dissertations Abstracts International. 85-03B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:643111

New Books MORE

최근 3년간 통계입니다.

Reserva
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
Mi carpeta

Material
número de libro	número de llamada	Ubicación	estado	Prestar info
TQ0029020	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Las reservas están disponibles en el libro de préstamos. Para hacer reservaciones, haga clic en el botón de reserva

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

detalle info

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK