본문

서브메뉴

On Transparent Optimizations for Communication in Highly Parallel Systems.
On Transparent Optimizations for Communication in Highly Parallel Systems.

상세정보

자료유형  
 학위논문
Control Number  
0017160308
International Standard Book Number  
9798381977202
Dewey Decimal Classification Number  
621.3
Main Entry-Personal Name  
Wilkins, Michael.
Publication, Distribution, etc. (Imprint  
[S.l.] : Northwestern University., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
216 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-10, Section: A.
General Note  
Advisor: Dinda, Peter A.;Hardavellas, Nikos.
Dissertation Note  
Thesis (Ph.D.)--Northwestern University, 2024.
Summary, Etc.  
요약To leverage the omnipresent hardware parallelism in modern systems, applications must efficiently communicate across parallel tasks, e.g., to share data or control execution flow. The longstanding mechanisms for shared memory and distributed memory, i.e., coherence and message passing, remain the dominant choices to implement communication. I argue that these stalwart constructs can be transparently optimized, improving performance without exposing developers to the growing complexity of modern hardware that employ both shared and distributed memory. Then, I explore the ultimate ambition: a unified transparent communication abstraction across all memory types.In shared memory multiprocessors, communication is performed implicitly. Cache coherence maintains the abstraction of a single shared memory among hardware threads, so the application does not have to explicitly move data between them. However, coherence protocols incur an increasing overhead in modern hardware due to their conservative, reactive policies. I designed a new coherence protocol called WARDen to exploit the novel WARD property, which indicates large regions of memory that do not require fine-grained coherence. By transparently disabling the coherence protocol when it is unneeded, WARDen maintains the abstraction of shared memory and improves application performance by an average of 1.46x.In distributed memory machines, communication between memory domains is performed explicitly by the application. To specify the necessary communication, collective operations are the predominant primitive because they allow programmers to elegantly specify large-scale communication patterns in a single function call. The Message Passing Interface (MPI) is the de-facto standard for collectives in high-performance distributed memory systems like supercomputers. MPI libraries typically contain 3-4 implementations (i.e., algorithms) for each collective pattern.Despite their utility, collectives suffer performance degradation due to poor algorithm selection in the underlying MPI library. I created a series of autotuners named FACT and ACCLAiM that use machine learning (ML) to tractably find the optimal collective algorithms for large-scale applications. The autotuners are sometimes limited when all the available algorithms fail to properly leverage the underlying hardware. To address this issue, I developed a set of more flexible algorithms that can better map to complex, modern networks and increase the potency of autotuning. Combining these efforts on Frontier (the world's fastest supercomputer at time of writing), I achieve speedups of over 4x compared to the proprietary vendor MPI library.Lastly, I explored my vision for a higher-level programming model that abstracts away communication altogether. I ported the popular NAS Parallel Benchmark Suite to an FMPL (Functional, Memory-managed, Parallel Language). I found that FMPLs have the potential to drastically improve transparency because the program does not need to be aware of communication at all. However, FMPLs are currently limited to shared memory machines. I built a prototype that extends an FMPL to distributed memory, charting the course to FMPLs in high-performance computing.Across these research thrusts, I developed novel optimizations for communication in high performance applications. Together, they show how existing communication abstractions, i.e., shared memory and message passing, can be transparently optimized, maintaining or even improving the level of abstraction exposed to the developer.
Subject Added Entry-Topical Term  
Computer engineering.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Communication.
Index Term-Uncontrolled  
Autotuning
Index Term-Uncontrolled  
Cache coherence
Index Term-Uncontrolled  
High-performance computing
Index Term-Uncontrolled  
Message Passing Interface
Index Term-Uncontrolled  
Programming models
Added Entry-Corporate Name  
Northwestern University Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 85-10A.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:654617

MARC

 008250224s2024        us  ||||||||||||||c||eng  d
■001000017160308
■00520250211150954
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798381977202
■035    ▼a(MiAaPQ)AAI30993477
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a621.3
■1001  ▼aWilkins,  Michael.▼0(orcid)0000-0003-0806-1599
■24510▼aOn  Transparent  Optimizations  for  Communication  in  Highly  Parallel  Systems.
■260    ▼a[S.l.]▼bNorthwestern  University.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a216  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-10,  Section:  A.
■500    ▼aAdvisor:  Dinda,  Peter  A.;Hardavellas,  Nikos.
■5021  ▼aThesis  (Ph.D.)--Northwestern  University,  2024.
■520    ▼aTo  leverage  the  omnipresent  hardware  parallelism  in  modern  systems,  applications  must  efficiently  communicate  across  parallel  tasks,  e.g.,  to  share  data  or  control  execution  flow.  The  longstanding  mechanisms  for  shared  memory  and  distributed  memory,  i.e.,  coherence  and  message  passing,  remain  the  dominant  choices  to  implement  communication.  I  argue  that  these  stalwart  constructs  can  be  transparently  optimized,  improving  performance  without  exposing  developers  to  the  growing  complexity  of  modern  hardware  that  employ  both  shared  and  distributed  memory.  Then,  I  explore  the  ultimate  ambition:  a  unified  transparent  communication  abstraction  across  all  memory  types.In  shared  memory  multiprocessors,  communication  is  performed  implicitly.  Cache  coherence  maintains  the  abstraction  of  a  single  shared  memory  among  hardware  threads,  so  the  application  does  not  have  to  explicitly  move  data  between  them.  However,  coherence  protocols  incur  an  increasing  overhead  in  modern  hardware  due  to  their  conservative,  reactive  policies.  I  designed  a  new  coherence  protocol  called  WARDen  to  exploit  the  novel  WARD  property,  which  indicates  large  regions  of  memory  that  do  not  require  fine-grained  coherence.  By  transparently  disabling  the  coherence  protocol  when  it  is  unneeded,  WARDen  maintains  the  abstraction  of  shared  memory  and  improves  application  performance  by  an  average  of  1.46x.In  distributed  memory  machines,  communication  between  memory  domains  is  performed  explicitly  by  the  application.  To  specify  the  necessary  communication,  collective  operations  are  the  predominant  primitive  because  they  allow  programmers  to  elegantly  specify  large-scale  communication  patterns  in  a  single  function  call.  The  Message  Passing  Interface  (MPI)  is  the  de-facto  standard  for  collectives  in  high-performance  distributed  memory  systems  like  supercomputers.  MPI  libraries  typically  contain  3-4  implementations  (i.e.,  algorithms)  for  each  collective  pattern.Despite  their  utility,  collectives  suffer  performance  degradation  due  to  poor  algorithm  selection  in  the  underlying  MPI  library.  I  created  a  series  of  autotuners  named  FACT  and  ACCLAiM  that  use  machine  learning  (ML)  to  tractably  find  the  optimal  collective  algorithms  for  large-scale  applications.  The  autotuners  are  sometimes  limited  when  all  the  available  algorithms  fail  to  properly  leverage  the  underlying  hardware.  To  address  this  issue,  I  developed  a  set  of  more  flexible  algorithms  that  can  better  map  to  complex,  modern  networks  and  increase  the  potency  of  autotuning.  Combining  these  efforts  on  Frontier  (the  world's  fastest  supercomputer  at  time  of  writing),  I  achieve  speedups  of  over  4x  compared  to  the  proprietary  vendor  MPI  library.Lastly,  I  explored  my  vision  for  a  higher-level  programming  model  that  abstracts  away  communication  altogether.  I  ported  the  popular  NAS  Parallel  Benchmark  Suite  to  an  FMPL  (Functional,  Memory-managed,  Parallel  Language).  I  found  that  FMPLs  have  the  potential  to  drastically  improve  transparency  because  the  program  does  not  need  to  be  aware  of  communication  at  all.  However,  FMPLs  are  currently  limited  to  shared  memory  machines.  I  built  a  prototype  that  extends  an  FMPL  to  distributed  memory,  charting  the  course  to  FMPLs  in  high-performance  computing.Across  these  research  thrusts,  I  developed  novel  optimizations  for  communication  in  high  performance  applications.  Together,  they  show  how  existing  communication  abstractions,  i.e.,  shared  memory  and  message  passing,  can  be  transparently  optimized,  maintaining  or  even  improving  the  level  of  abstraction  exposed  to  the  developer.
■590    ▼aSchool  code:  0163.
■650  4▼aComputer  engineering.
■650  4▼aComputer  science.
■650  4▼aCommunication.
■653    ▼aAutotuning
■653    ▼aCache  coherence
■653    ▼aHigh-performance  computing
■653    ▼aMessage  Passing  Interface
■653    ▼aProgramming  models
■690    ▼a0464
■690    ▼a0984
■690    ▼a0459
■71020▼aNorthwestern  University▼bComputer  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g85-10A.
■790    ▼a0163
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17160308▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    신착도서 더보기
    최근 3년간 통계입니다.

    소장정보

    • 예약
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • 나의폴더
    소장자료
    등록번호 청구기호 소장처 대출가능여부 대출정보
    TQ0030539 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

    해당 도서를 다른 이용자가 함께 대출한 도서

    관련도서

    관련 인기도서

    도서위치