중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Hardware-Aware Algorithms for Efficient Machine Learning- [electronic resource]

자료유형: 학위논문

Control Number: 0016934514

International Standard Book Number: 9798380485500

Dewey Decimal Classification Number: 006.35

Main Entry-Personal Name: Quang, Tri Dao Phuc.

Publication, Distribution, etc. (Imprint: [S.l.] : Stanford University., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(216 p.)

General Note: Source: Dissertations Abstracts International, Volume: 85-04, Section: B.

General Note: Advisor: Re, Chris;Ermon, Stefano.

Dissertation Note: Thesis (Ph.D.)--Stanford University, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Summary, Etc.: 요약Machine learning (ML) training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used in more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context.In Chapter 2, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms that, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality.In Chapter 3, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies.In Chapter 4, we examine state-space models, a promising architecture designed for long-range memory. As we seek to understand why early state-space models did not perform well on language modeling tasks, we propose simple multiplicative interaction that expands their expressiveness. We also design hardware-friendly algorithms to train them. As a result, we are able to train state-space models to multi-billion parameter scale, demonstrating a new kind of model competitive with the dominant Transformers in language modeling.We conclude with some exciting directions in ML and systems, such as softwarehardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities.

Subject Added Entry-Topical Term: Text categorization.

Subject Added Entry-Topical Term: Protons.

Subject Added Entry-Topical Term: Benchmarks.

Subject Added Entry-Topical Term: Atomic physics.

Subject Added Entry-Topical Term: Physics.

Added Entry-Corporate Name: Stanford University.

Host Item Entry: Dissertations Abstracts International. 85-04B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:643559

New Books MORE

최근 3년간 통계입니다.

הזמנה
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
התיקיה שלי

גשמי
Reg No.	Call No.	מיקום	מצב	להשאיל מידע
TQ0029464	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

פרט מידע

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK