서브메뉴
검색
Weakly-Supervised Evaluation of Medical AI Systems.
Weakly-Supervised Evaluation of Medical AI Systems.
- 자료유형
- 학위논문
- Control Number
- 0017163322
- International Standard Book Number
- 9798384023517
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- Pugh, Sydney Faye.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of Pennsylvania., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 134 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-02, Section: B.
- General Note
- Advisor: Lee, Insup;Weimer, James.
- Dissertation Note
- Thesis (Ph.D.)--University of Pennsylvania, 2024.
- Summary, Etc.
- 요약Medical artificial intelligence (AI) systems must undergo comprehensive clinical evaluations to ensure their safety and efficacy prior to being deployed into clinical practice. Clinical trials are the gold standard for evaluating medical AI systems, but they are a substantial commitment of data collection resources. Consequently, there is a growing need for cheap and fast evaluation methods that can approximate the outcomes of clinical trials. Engineers can use these methods to conduct early-stage, low-cost assessments of novel medical AI systems, which enables informed and more efficient utilization of data collection resources.This dissertation presents weakly-supervised evaluation methods for medical AI systems. These methods are lightweight, leveraging existing (previously collected) unlabeled trial data and programmatic weak supervision (PWS) -- a data labeling paradigm based on combining noisy and cheap-to-obtain labeling heuristics defined by domain experts (e.g., clinicians). We propose two distinct weakly-supervised performance evaluation methods. The first method estimates the sensitivity/specificity of a system in the form of confidence bounds by leveraging samples labeled with high confidence via PWS. We apply our method to several clinical alarm suppression systems and demonstrate that our method yields confidence bounds that fully contain the true sensitivity/specificity. The second method estimates the robustness of a system by observing its trend in performance on a sequence of adversarially ordered datasets. These datasets are constructed from an adversarial ordering of the input data based on a Clopper Pearson confidence interval for PWS label confidences. We demonstrate the utility of this method by evaluating synthetic alarm suppression systems designed to have varying levels of accuracy across five clinical alarm datasets. An inherent challenge of these methods is the need for effective labeling heuristics, which clinicians often find difficult to design for high-dimensional medical data (e.g., images and time series). To address this, we propose an automated clinician-in-the-loop method to generate weak labels for facilitating PWS, leveraging distance functions. We demonstrate that using these generated weak labels in PWS results in labels that generally outperform those obtained using clinician-supplied labeling heuristics.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Medicine.
- Index Term-Uncontrolled
- Data programming
- Index Term-Uncontrolled
- Evaluation
- Index Term-Uncontrolled
- Machine learning
- Index Term-Uncontrolled
- Weak supervision
- Added Entry-Corporate Name
- University of Pennsylvania Computer and Information Science
- Host Item Entry
- Dissertations Abstracts International. 86-02B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:656221