서브메뉴
검색
Augmenting Medical Image Classifiers With Synthetic Data Across Populations.
Augmenting Medical Image Classifiers With Synthetic Data Across Populations.
- 자료유형
- 학위논문
- Control Number
- 0017161793
- International Standard Book Number
- 9798382776552
- Dewey Decimal Classification Number
- 574
- Main Entry-Personal Name
- Sagers, Luke William.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Harvard University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 125 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
- General Note
- Advisor: Manrai, Arjun K.
- Dissertation Note
- Thesis (Ph.D.)--Harvard University, 2024.
- Summary, Etc.
- 요약Rapid improvements in the capabilities of generative artificial intelligence (AI) models to produce and interpret images have created new possibilities to address persistent challenges in medical machine learning including data scarcity, annotation costs, and model biases. This dissertation primarily explores the potential opportunities and limitations of using synthetic images created by large generative AI models to improve the performance and generalizability of medical image classifiers across populations. We also evaluate the ability of consumer-facing vision-language models to classify dermatology and chest X-ray images under different prompts.We designed an image-generation pipeline by fine-tuning diffusion-based models to create synthetic skin disease images using both generative fill and text-to-image methods. With this pipeline, we generated 500,000 synthetic dermatology images (which we publicly released for future research) representing 12 diseases across diverse skin tones. We then systematically evaluated the performance of AI disease classifiers when including or excluding synthetic images in model training. We show that in data-limited settings (in which there are few real images of a disease or skin-tone), synthetic data can improve classifier performance, but that these gains saturate once sufficient quantities of real images are available. We find that the biggest driver of model improvements is the quantity of real images. We also observed a correlation between the physician-assessed photorealism of synthetic images and gains in model performance. Collectively, these findings suggest that synthetic data presents a complementary tool to training disease classifiers and can be useful as an advanced augmentation method or a way to share features of a data distribution without sharing the data itself. However, efforts must still be focused on collecting more high quality, diverse, real data to train the next generation of fair, robust, and generalizable AI systems. We also evaluated the capabilities of consumer-facing and general-purpose vision-language AI models in interpreting chest X-rays and dermatology images. We found that these systems, which have not been specifically trained for medical image diagnoses, can perform at or near-human level on selected metrics, and that model performance and behavior can be influenced using the text prompt and task formulation. Our analysis suggests that evaluations of large language and vision-language models should carefully consider the prompt context and other inputs. Overall, this dissertation provides a systematic analysis of the opportunities, pitfalls, and open challenges regarding the use of synthetic data and generative AI for improving medical imaging across all populations.
- Subject Added Entry-Topical Term
- Bioinformatics.
- Subject Added Entry-Topical Term
- Dermatology.
- Subject Added Entry-Topical Term
- Medicine.
- Subject Added Entry-Topical Term
- Medical imaging.
- Index Term-Uncontrolled
- Diffusion models
- Index Term-Uncontrolled
- Generative AI
- Index Term-Uncontrolled
- Image classification
- Index Term-Uncontrolled
- Synthetic data
- Index Term-Uncontrolled
- Vision-language models
- Index Term-Uncontrolled
- Skin disease images
- Added Entry-Corporate Name
- Harvard University Medical Sciences
- Host Item Entry
- Dissertations Abstracts International. 85-12B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:657471
Подробнее информация.
- Бронирование
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- моя папка