본문

서브메뉴

On Transparent Optimizations for Communication in Highly Parallel Systems.
내용보기
On Transparent Optimizations for Communication in Highly Parallel Systems.
자료유형  
 학위논문
Control Number  
0017160308
International Standard Book Number  
9798381977202
Dewey Decimal Classification Number  
621.3
Main Entry-Personal Name  
Wilkins, Michael.
Publication, Distribution, etc. (Imprint  
[S.l.] : Northwestern University., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
216 p.
General Note  
Source: Dissertations Abstracts International, Volume: 85-10, Section: A.
General Note  
Advisor: Dinda, Peter A.;Hardavellas, Nikos.
Dissertation Note  
Thesis (Ph.D.)--Northwestern University, 2024.
Summary, Etc.  
요약To leverage the omnipresent hardware parallelism in modern systems, applications must efficiently communicate across parallel tasks, e.g., to share data or control execution flow. The longstanding mechanisms for shared memory and distributed memory, i.e., coherence and message passing, remain the dominant choices to implement communication. I argue that these stalwart constructs can be transparently optimized, improving performance without exposing developers to the growing complexity of modern hardware that employ both shared and distributed memory. Then, I explore the ultimate ambition: a unified transparent communication abstraction across all memory types.In shared memory multiprocessors, communication is performed implicitly. Cache coherence maintains the abstraction of a single shared memory among hardware threads, so the application does not have to explicitly move data between them. However, coherence protocols incur an increasing overhead in modern hardware due to their conservative, reactive policies. I designed a new coherence protocol called WARDen to exploit the novel WARD property, which indicates large regions of memory that do not require fine-grained coherence. By transparently disabling the coherence protocol when it is unneeded, WARDen maintains the abstraction of shared memory and improves application performance by an average of 1.46x.In distributed memory machines, communication between memory domains is performed explicitly by the application. To specify the necessary communication, collective operations are the predominant primitive because they allow programmers to elegantly specify large-scale communication patterns in a single function call. The Message Passing Interface (MPI) is the de-facto standard for collectives in high-performance distributed memory systems like supercomputers. MPI libraries typically contain 3-4 implementations (i.e., algorithms) for each collective pattern.Despite their utility, collectives suffer performance degradation due to poor algorithm selection in the underlying MPI library. I created a series of autotuners named FACT and ACCLAiM that use machine learning (ML) to tractably find the optimal collective algorithms for large-scale applications. The autotuners are sometimes limited when all the available algorithms fail to properly leverage the underlying hardware. To address this issue, I developed a set of more flexible algorithms that can better map to complex, modern networks and increase the potency of autotuning. Combining these efforts on Frontier (the world's fastest supercomputer at time of writing), I achieve speedups of over 4x compared to the proprietary vendor MPI library.Lastly, I explored my vision for a higher-level programming model that abstracts away communication altogether. I ported the popular NAS Parallel Benchmark Suite to an FMPL (Functional, Memory-managed, Parallel Language). I found that FMPLs have the potential to drastically improve transparency because the program does not need to be aware of communication at all. However, FMPLs are currently limited to shared memory machines. I built a prototype that extends an FMPL to distributed memory, charting the course to FMPLs in high-performance computing.Across these research thrusts, I developed novel optimizations for communication in high performance applications. Together, they show how existing communication abstractions, i.e., shared memory and message passing, can be transparently optimized, maintaining or even improving the level of abstraction exposed to the developer.
Subject Added Entry-Topical Term  
Computer engineering.
Subject Added Entry-Topical Term  
Computer science.
Subject Added Entry-Topical Term  
Communication.
Index Term-Uncontrolled  
Autotuning
Index Term-Uncontrolled  
Cache coherence
Index Term-Uncontrolled  
High-performance computing
Index Term-Uncontrolled  
Message Passing Interface
Index Term-Uncontrolled  
Programming models
Added Entry-Corporate Name  
Northwestern University Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 85-10A.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:654617
신착도서 더보기
최근 3년간 통계입니다.

소장정보

  • 예약
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • 나의폴더
소장자료
등록번호 청구기호 소장처 대출가능여부 대출정보
TQ0030539 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

해당 도서를 다른 이용자가 함께 대출한 도서

관련도서

관련 인기도서

도서위치