중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Augmenting Medical Image Classifiers With Synthetic Data Across Populations.

자료유형: 학위논문

Control Number: 0017161793

International Standard Book Number: 9798382776552

Dewey Decimal Classification Number: 574

Main Entry-Personal Name: Sagers, Luke William.

Publication, Distribution, etc. (Imprint: [S.l.] : Harvard University., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 125 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-12, Section: B.

General Note: Advisor: Manrai, Arjun K.

Dissertation Note: Thesis (Ph.D.)--Harvard University, 2024.

Summary, Etc.: 요약Rapid improvements in the capabilities of generative artificial intelligence (AI) models to produce and interpret images have created new possibilities to address persistent challenges in medical machine learning including data scarcity, annotation costs, and model biases. This dissertation primarily explores the potential opportunities and limitations of using synthetic images created by large generative AI models to improve the performance and generalizability of medical image classifiers across populations. We also evaluate the ability of consumer-facing vision-language models to classify dermatology and chest X-ray images under different prompts.We designed an image-generation pipeline by fine-tuning diffusion-based models to create synthetic skin disease images using both generative fill and text-to-image methods. With this pipeline, we generated 500,000 synthetic dermatology images (which we publicly released for future research) representing 12 diseases across diverse skin tones. We then systematically evaluated the performance of AI disease classifiers when including or excluding synthetic images in model training. We show that in data-limited settings (in which there are few real images of a disease or skin-tone), synthetic data can improve classifier performance, but that these gains saturate once sufficient quantities of real images are available. We find that the biggest driver of model improvements is the quantity of real images. We also observed a correlation between the physician-assessed photorealism of synthetic images and gains in model performance. Collectively, these findings suggest that synthetic data presents a complementary tool to training disease classifiers and can be useful as an advanced augmentation method or a way to share features of a data distribution without sharing the data itself. However, efforts must still be focused on collecting more high quality, diverse, real data to train the next generation of fair, robust, and generalizable AI systems. We also evaluated the capabilities of consumer-facing and general-purpose vision-language AI models in interpreting chest X-rays and dermatology images. We found that these systems, which have not been specifically trained for medical image diagnoses, can perform at or near-human level on selected metrics, and that model performance and behavior can be influenced using the text prompt and task formulation. Our analysis suggests that evaluations of large language and vision-language models should carefully consider the prompt context and other inputs. Overall, this dissertation provides a systematic analysis of the opportunities, pitfalls, and open challenges regarding the use of synthetic data and generative AI for improving medical imaging across all populations.

Subject Added Entry-Topical Term: Bioinformatics.

Subject Added Entry-Topical Term: Dermatology.

Subject Added Entry-Topical Term: Medicine.

Subject Added Entry-Topical Term: Medical imaging.

Index Term-Uncontrolled: Diffusion models

Index Term-Uncontrolled: Generative AI

Index Term-Uncontrolled: Image classification

Index Term-Uncontrolled: Synthetic data

Index Term-Uncontrolled: Vision-language models

Index Term-Uncontrolled: Skin disease images

Added Entry-Corporate Name: Harvard University Medical Sciences

Host Item Entry: Dissertations Abstracts International. 85-12B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:657471

008250224s2024        us  ||||||||||||||c||eng  d
■001000017161793
■00520250211151446
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798382776552
■035    ▼a(MiAaPQ)AAI31296447
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a574
■1001  ▼aSagers,  Luke  William.▼0(orcid)0000-0002-5024-7314
■24510▼aAugmenting  Medical  Image  Classifiers  With  Synthetic  Data  Across  Populations.
■260    ▼a[S.l.]▼bHarvard  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a125  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-12,  Section:  B.
■500    ▼aAdvisor:  Manrai,  Arjun  K.
■5021  ▼aThesis  (Ph.D.)--Harvard  University,  2024.
■520    ▼aRapid  improvements  in  the  capabilities  of  generative  artificial  intelligence  (AI)  models  to  produce  and  interpret  images  have  created  new  possibilities  to  address  persistent  challenges  in  medical  machine  learning  including  data  scarcity,  annotation  costs,  and  model  biases.  This  dissertation  primarily  explores  the  potential  opportunities  and  limitations  of  using  synthetic  images  created  by  large  generative  AI  models  to  improve  the  performance  and  generalizability  of  medical  image  classifiers  across  populations.  We  also  evaluate  the  ability  of  consumer-facing  vision-language  models  to  classify  dermatology  and  chest  X-ray  images  under  different  prompts.We  designed  an  image-generation  pipeline  by  fine-tuning  diffusion-based  models  to  create  synthetic  skin  disease  images  using  both  generative  fill  and  text-to-image  methods.  With  this  pipeline,  we  generated  500,000  synthetic  dermatology  images  (which  we  publicly  released  for  future  research)  representing  12  diseases  across  diverse  skin  tones.  We  then  systematically  evaluated  the  performance  of  AI  disease  classifiers  when  including  or  excluding  synthetic  images  in  model  training.  We  show  that  in  data-limited  settings  (in  which  there  are  few  real  images  of  a  disease  or  skin-tone),  synthetic  data  can  improve  classifier  performance,  but  that  these  gains  saturate  once  sufficient  quantities  of  real  images  are  available.  We  find  that  the  biggest  driver  of  model  improvements  is  the  quantity  of  real  images.  We  also  observed  a  correlation  between  the  physician-assessed  photorealism  of  synthetic  images  and  gains  in  model  performance.  Collectively,  these  findings  suggest  that  synthetic  data  presents  a  complementary  tool  to  training  disease  classifiers  and  can  be  useful  as  an  advanced  augmentation  method  or  a  way  to  share  features  of  a  data  distribution  without  sharing  the  data  itself.  However,  efforts  must  still  be  focused  on  collecting  more  high  quality,  diverse,  real  data  to  train  the  next  generation  of  fair,  robust,  and  generalizable  AI  systems.  We  also  evaluated  the  capabilities  of  consumer-facing  and  general-purpose  vision-language  AI  models  in  interpreting  chest  X-rays  and  dermatology  images.  We  found  that  these  systems,  which  have  not  been  specifically  trained  for  medical  image  diagnoses,  can  perform  at  or  near-human  level  on  selected  metrics,  and  that  model  performance  and  behavior  can  be  influenced  using  the  text  prompt  and  task  formulation.  Our  analysis  suggests  that  evaluations  of  large  language  and  vision-language  models  should  carefully  consider  the  prompt  context  and  other  inputs.  Overall,  this  dissertation  provides  a  systematic  analysis  of  the  opportunities,  pitfalls,  and  open  challenges  regarding  the  use  of  synthetic  data  and  generative  AI  for  improving  medical  imaging  across  all  populations.
■590    ▼aSchool  code:  0084.
■650  4▼aBioinformatics.
■650  4▼aDermatology.
■650  4▼aMedicine.
■650  4▼aMedical  imaging.
■653    ▼aDiffusion  models
■653    ▼aGenerative  AI
■653    ▼aImage  classification
■653    ▼aSynthetic  data
■653    ▼aVision-language  models
■653    ▼aSkin  disease  images
■690    ▼a0715
■690    ▼a0574
■690    ▼a0564
■690    ▼a0800
■690    ▼a0757
■71020▼aHarvard  University▼bMedical  Sciences.
■7730  ▼tDissertations  Abstracts  International▼g85-12B.
■790    ▼a0084
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161793▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Бронирование
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
моя папка

материал
Reg No.	Количество платежных	Местоположение	статус	Ленд информации
TQ0033689	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Бронирование доступны в заимствований книги. Чтобы сделать предварительный заказ, пожалуйста, нажмите кнопку бронирование

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Подробнее информация.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK