본문

서브메뉴

Augmenting Medical Image Classifiers With Synthetic Data Across Populations.
Augmenting Medical Image Classifiers With Synthetic Data Across Populations.

상세정보

자료유형  
 학위논문
Control Number  
0017161793
International Standard Book Number  
9798382776552
Dewey Decimal Classification Number  
574
Main Entry-Personal Name  
Sagers, Luke William.
Publication, Distribution, etc. (Imprint  
[S.l.] : Harvard University., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
125 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
General Note  
Advisor: Manrai, Arjun K.
Dissertation Note  
Thesis (Ph.D.)--Harvard University, 2024.
Summary, Etc.  
요약Rapid improvements in the capabilities of generative artificial intelligence (AI) models to produce and interpret images have created new possibilities to address persistent challenges in medical machine learning including data scarcity, annotation costs, and model biases. This dissertation primarily explores the potential opportunities and limitations of using synthetic images created by large generative AI models to improve the performance and generalizability of medical image classifiers across populations. We also evaluate the ability of consumer-facing vision-language models to classify dermatology and chest X-ray images under different prompts.We designed an image-generation pipeline by fine-tuning diffusion-based models to create synthetic skin disease images using both generative fill and text-to-image methods. With this pipeline, we generated 500,000 synthetic dermatology images (which we publicly released for future research) representing 12 diseases across diverse skin tones. We then systematically evaluated the performance of AI disease classifiers when including or excluding synthetic images in model training. We show that in data-limited settings (in which there are few real images of a disease or skin-tone), synthetic data can improve classifier performance, but that these gains saturate once sufficient quantities of real images are available. We find that the biggest driver of model improvements is the quantity of real images. We also observed a correlation between the physician-assessed photorealism of synthetic images and gains in model performance. Collectively, these findings suggest that synthetic data presents a complementary tool to training disease classifiers and can be useful as an advanced augmentation method or a way to share features of a data distribution without sharing the data itself. However, efforts must still be focused on collecting more high quality, diverse, real data to train the next generation of fair, robust, and generalizable AI systems. We also evaluated the capabilities of consumer-facing and general-purpose vision-language AI models in interpreting chest X-rays and dermatology images. We found that these systems, which have not been specifically trained for medical image diagnoses, can perform at or near-human level on selected metrics, and that model performance and behavior can be influenced using the text prompt and task formulation. Our analysis suggests that evaluations of large language and vision-language models should carefully consider the prompt context and other inputs. Overall, this dissertation provides a systematic analysis of the opportunities, pitfalls, and open challenges regarding the use of synthetic data and generative AI for improving medical imaging across all populations.
Subject Added Entry-Topical Term  
Bioinformatics.
Subject Added Entry-Topical Term  
Dermatology.
Subject Added Entry-Topical Term  
Medicine.
Subject Added Entry-Topical Term  
Medical imaging.
Index Term-Uncontrolled  
Diffusion models
Index Term-Uncontrolled  
Generative AI
Index Term-Uncontrolled  
Image classification
Index Term-Uncontrolled  
Synthetic data
Index Term-Uncontrolled  
Vision-language models
Index Term-Uncontrolled  
Skin disease images
Added Entry-Corporate Name  
Harvard University Medical Sciences
Host Item Entry  
Dissertations Abstracts International. 85-12B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:657471

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017161793
■00520250211151446
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798382776552
■035    ▼a(MiAaPQ)AAI31296447
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a574
■1001  ▼aSagers,  Luke  William.▼0(orcid)0000-0002-5024-7314
■24510▼aAugmenting  Medical  Image  Classifiers  With  Synthetic  Data  Across  Populations.
■260    ▼a[S.l.]▼bHarvard  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a125  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-12,  Section:  B.
■500    ▼aAdvisor:  Manrai,  Arjun  K.
■5021  ▼aThesis  (Ph.D.)--Harvard  University,  2024.
■520    ▼aRapid  improvements  in  the  capabilities  of  generative  artificial  intelligence  (AI)  models  to  produce  and  interpret  images  have  created  new  possibilities  to  address  persistent  challenges  in  medical  machine  learning  including  data  scarcity,  annotation  costs,  and  model  biases.  This  dissertation  primarily  explores  the  potential  opportunities  and  limitations  of  using  synthetic  images  created  by  large  generative  AI  models  to  improve  the  performance  and  generalizability  of  medical  image  classifiers  across  populations.  We  also  evaluate  the  ability  of  consumer-facing  vision-language  models  to  classify  dermatology  and  chest  X-ray  images  under  different  prompts.We  designed  an  image-generation  pipeline  by  fine-tuning  diffusion-based  models  to  create  synthetic  skin  disease  images  using  both  generative  fill  and  text-to-image  methods.  With  this  pipeline,  we  generated  500,000  synthetic  dermatology  images  (which  we  publicly  released  for  future  research)  representing  12  diseases  across  diverse  skin  tones.  We  then  systematically  evaluated  the  performance  of  AI  disease  classifiers  when  including  or  excluding  synthetic  images  in  model  training.  We  show  that  in  data-limited  settings  (in  which  there  are  few  real  images  of  a  disease  or  skin-tone),  synthetic  data  can  improve  classifier  performance,  but  that  these  gains  saturate  once  sufficient  quantities  of  real  images  are  available.  We  find  that  the  biggest  driver  of  model  improvements  is  the  quantity  of  real  images.  We  also  observed  a  correlation  between  the  physician-assessed  photorealism  of  synthetic  images  and  gains  in  model  performance.  Collectively,  these  findings  suggest  that  synthetic  data  presents  a  complementary  tool  to  training  disease  classifiers  and  can  be  useful  as  an  advanced  augmentation  method  or  a  way  to  share  features  of  a  data  distribution  without  sharing  the  data  itself.  However,  efforts  must  still  be  focused  on  collecting  more  high  quality,  diverse,  real  data  to  train  the  next  generation  of  fair,  robust,  and  generalizable  AI  systems.  We  also  evaluated  the  capabilities  of  consumer-facing  and  general-purpose  vision-language  AI  models  in  interpreting  chest  X-rays  and  dermatology  images.  We  found  that  these  systems,  which  have  not  been  specifically  trained  for  medical  image  diagnoses,  can  perform  at  or  near-human  level  on  selected  metrics,  and  that  model  performance  and  behavior  can  be  influenced  using  the  text  prompt  and  task  formulation.  Our  analysis  suggests  that  evaluations  of  large  language  and  vision-language  models  should  carefully  consider  the  prompt  context  and  other  inputs.  Overall,  this  dissertation  provides  a  systematic  analysis  of  the  opportunities,  pitfalls,  and  open  challenges  regarding  the  use  of  synthetic  data  and  generative  AI  for  improving  medical  imaging  across  all  populations.
■590    ▼aSchool  code:  0084.
■650  4▼aBioinformatics.
■650  4▼aDermatology.
■650  4▼aMedicine.
■650  4▼aMedical  imaging.
■653    ▼aDiffusion  models
■653    ▼aGenerative  AI
■653    ▼aImage  classification
■653    ▼aSynthetic  data
■653    ▼aVision-language  models
■653    ▼aSkin  disease  images
■690    ▼a0715
■690    ▼a0574
■690    ▼a0564
■690    ▼a0800
■690    ▼a0757
■71020▼aHarvard  University▼bMedical  Sciences.
■7730  ▼tDissertations  Abstracts  International▼g85-12B.
■790    ▼a0084
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161793▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    Подробнее информация.

    • Бронирование
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • моя папка
    материал
    Reg No. Количество платежных Местоположение статус Ленд информации
    TQ0033689 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * Бронирование доступны в заимствований книги. Чтобы сделать предварительный заказ, пожалуйста, нажмите кнопку бронирование

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치