서브메뉴
검색
On Transparent Optimizations for Communication in Highly Parallel Systems.
On Transparent Optimizations for Communication in Highly Parallel Systems.
- 자료유형
- 학위논문
- Control Number
- 0017160308
- International Standard Book Number
- 9798381977202
- Dewey Decimal Classification Number
- 621.3
- Main Entry-Personal Name
- Wilkins, Michael.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Northwestern University., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 216 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 85-10, Section: A.
- General Note
- Advisor: Dinda, Peter A.;Hardavellas, Nikos.
- Dissertation Note
- Thesis (Ph.D.)--Northwestern University, 2024.
- Summary, Etc.
- 요약To leverage the omnipresent hardware parallelism in modern systems, applications must efficiently communicate across parallel tasks, e.g., to share data or control execution flow. The longstanding mechanisms for shared memory and distributed memory, i.e., coherence and message passing, remain the dominant choices to implement communication. I argue that these stalwart constructs can be transparently optimized, improving performance without exposing developers to the growing complexity of modern hardware that employ both shared and distributed memory. Then, I explore the ultimate ambition: a unified transparent communication abstraction across all memory types.In shared memory multiprocessors, communication is performed implicitly. Cache coherence maintains the abstraction of a single shared memory among hardware threads, so the application does not have to explicitly move data between them. However, coherence protocols incur an increasing overhead in modern hardware due to their conservative, reactive policies. I designed a new coherence protocol called WARDen to exploit the novel WARD property, which indicates large regions of memory that do not require fine-grained coherence. By transparently disabling the coherence protocol when it is unneeded, WARDen maintains the abstraction of shared memory and improves application performance by an average of 1.46x.In distributed memory machines, communication between memory domains is performed explicitly by the application. To specify the necessary communication, collective operations are the predominant primitive because they allow programmers to elegantly specify large-scale communication patterns in a single function call. The Message Passing Interface (MPI) is the de-facto standard for collectives in high-performance distributed memory systems like supercomputers. MPI libraries typically contain 3-4 implementations (i.e., algorithms) for each collective pattern.Despite their utility, collectives suffer performance degradation due to poor algorithm selection in the underlying MPI library. I created a series of autotuners named FACT and ACCLAiM that use machine learning (ML) to tractably find the optimal collective algorithms for large-scale applications. The autotuners are sometimes limited when all the available algorithms fail to properly leverage the underlying hardware. To address this issue, I developed a set of more flexible algorithms that can better map to complex, modern networks and increase the potency of autotuning. Combining these efforts on Frontier (the world's fastest supercomputer at time of writing), I achieve speedups of over 4x compared to the proprietary vendor MPI library.Lastly, I explored my vision for a higher-level programming model that abstracts away communication altogether. I ported the popular NAS Parallel Benchmark Suite to an FMPL (Functional, Memory-managed, Parallel Language). I found that FMPLs have the potential to drastically improve transparency because the program does not need to be aware of communication at all. However, FMPLs are currently limited to shared memory machines. I built a prototype that extends an FMPL to distributed memory, charting the course to FMPLs in high-performance computing.Across these research thrusts, I developed novel optimizations for communication in high performance applications. Together, they show how existing communication abstractions, i.e., shared memory and message passing, can be transparently optimized, maintaining or even improving the level of abstraction exposed to the developer.
- Subject Added Entry-Topical Term
- Computer engineering.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Communication.
- Index Term-Uncontrolled
- Autotuning
- Index Term-Uncontrolled
- Cache coherence
- Index Term-Uncontrolled
- High-performance computing
- Index Term-Uncontrolled
- Message Passing Interface
- Index Term-Uncontrolled
- Programming models
- Added Entry-Corporate Name
- Northwestern University Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 85-10A.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:654617