중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

On Transparent Optimizations for Communication in Highly Parallel Systems.

자료유형: 학위논문

Control Number: 0017160308

International Standard Book Number: 9798381977202

Dewey Decimal Classification Number: 621.3

Main Entry-Personal Name: Wilkins, Michael.

Publication, Distribution, etc. (Imprint: [S.l.] : Northwestern University., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 216 p.

General Note: Source: Dissertations Abstracts International, Volume: 85-10, Section: A.

General Note: Advisor: Dinda, Peter A.;Hardavellas, Nikos.

Dissertation Note: Thesis (Ph.D.)--Northwestern University, 2024.

Summary, Etc.: 요약To leverage the omnipresent hardware parallelism in modern systems, applications must efficiently communicate across parallel tasks, e.g., to share data or control execution flow. The longstanding mechanisms for shared memory and distributed memory, i.e., coherence and message passing, remain the dominant choices to implement communication. I argue that these stalwart constructs can be transparently optimized, improving performance without exposing developers to the growing complexity of modern hardware that employ both shared and distributed memory. Then, I explore the ultimate ambition: a unified transparent communication abstraction across all memory types.In shared memory multiprocessors, communication is performed implicitly. Cache coherence maintains the abstraction of a single shared memory among hardware threads, so the application does not have to explicitly move data between them. However, coherence protocols incur an increasing overhead in modern hardware due to their conservative, reactive policies. I designed a new coherence protocol called WARDen to exploit the novel WARD property, which indicates large regions of memory that do not require fine-grained coherence. By transparently disabling the coherence protocol when it is unneeded, WARDen maintains the abstraction of shared memory and improves application performance by an average of 1.46x.In distributed memory machines, communication between memory domains is performed explicitly by the application. To specify the necessary communication, collective operations are the predominant primitive because they allow programmers to elegantly specify large-scale communication patterns in a single function call. The Message Passing Interface (MPI) is the de-facto standard for collectives in high-performance distributed memory systems like supercomputers. MPI libraries typically contain 3-4 implementations (i.e., algorithms) for each collective pattern.Despite their utility, collectives suffer performance degradation due to poor algorithm selection in the underlying MPI library. I created a series of autotuners named FACT and ACCLAiM that use machine learning (ML) to tractably find the optimal collective algorithms for large-scale applications. The autotuners are sometimes limited when all the available algorithms fail to properly leverage the underlying hardware. To address this issue, I developed a set of more flexible algorithms that can better map to complex, modern networks and increase the potency of autotuning. Combining these efforts on Frontier (the world's fastest supercomputer at time of writing), I achieve speedups of over 4x compared to the proprietary vendor MPI library.Lastly, I explored my vision for a higher-level programming model that abstracts away communication altogether. I ported the popular NAS Parallel Benchmark Suite to an FMPL (Functional, Memory-managed, Parallel Language). I found that FMPLs have the potential to drastically improve transparency because the program does not need to be aware of communication at all. However, FMPLs are currently limited to shared memory machines. I built a prototype that extends an FMPL to distributed memory, charting the course to FMPLs in high-performance computing.Across these research thrusts, I developed novel optimizations for communication in high performance applications. Together, they show how existing communication abstractions, i.e., shared memory and message passing, can be transparently optimized, maintaining or even improving the level of abstraction exposed to the developer.

Subject Added Entry-Topical Term: Computer engineering.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Communication.

Index Term-Uncontrolled: Autotuning

Index Term-Uncontrolled: Cache coherence

Index Term-Uncontrolled: High-performance computing

Index Term-Uncontrolled: Message Passing Interface

Index Term-Uncontrolled: Programming models

Added Entry-Corporate Name: Northwestern University Computer Engineering

Host Item Entry: Dissertations Abstracts International. 85-10A.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:654617

신착도서 더보기

최근 3년간 통계입니다.

예약
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
나의폴더

소장자료
등록번호	청구기호	소장처	대출가능여부	대출정보
TQ0030539	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

본문

서브메뉴

검색

신착도서 더보기

최근 3년간 통계입니다.

소장정보

해당 도서를 다른 이용자가 함께 대출한 도서

관련도서

관련 인기도서

도서위치

QUICK LINK