서브메뉴
검색
Data-Driven Statistical Sharding for Industry-Scale Neural Recommendation- [electronic resource]
Data-Driven Statistical Sharding for Industry-Scale Neural Recommendation- [electronic resource]
상세정보
- 자료유형
- 학위논문
- Control Number
- 0016931978
- International Standard Book Number
- 9798379652869
- Dewey Decimal Classification Number
- 300
- Main Entry-Personal Name
- Sethi, Geet.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Stanford University., 2023
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Physical Description
- 1 online resource(111 p.)
- General Note
- Source: Dissertations Abstracts International, Volume: 84-12, Section: A.
- General Note
- Advisor: Trippel, Caroline;Wu, Carole-Jean;Kozyrakis, Christos.
- Dissertation Note
- Thesis (Ph.D.)--Stanford University, 2023.
- Restrictions on Access Note
- This item must not be sold to any third party vendors.
- Summary, Etc.
- 요약Deep learning based recommendation models (DLRMs) form the backbone of many internet-scale services such as web search, social media, and video streaming. Primarily composed of massive embedding tables, potentially terabytes in size, these models require immense system resources to train and the solving of the sharding problem. The sharding problem is the task of partitioning and placing the embedding table parameters throughout the target system memory topology such that training throughput is maximized.This dissertation: (1) Characterizes and derives statistics from DLRM training data which can be used to accurately and granularly predict the memory demands of individual embedding table rows; (2) Presents RecShard, a mixed-integer linear program based approach which uses these statistics to solve the sharding problem for capacity constrained single-node systems, where parameters must be placed across high-performance GPU HBM and much slower CPU DRAM; reducing accesses to the latter by orders of magnitude; and (3) Presents FlexShard, a precise row-level sharding algorithm which focuses on sharding emerging sequence-based DLRMs across multi-node GPU training clusters; leveraging these statistics to significantly reduce inter-node communication demand, the bottleneck of scale-out DLRM training.The size of industry-scale DLRMs requires sharding to be performed; however the skewed power-law nature of DLRM training data causes imprecise partitioning and placement decisions to result in imbalanced load across the system memory topology. The contributions of this dissertation provide a foundation upon which one can reason about the access patterns to fine-grained regions of DLRM memory; as well as two novel sharding techniques built upon this foundation. These techniques demonstrate significant improvements over the prior state-of-the-art on real-world production data and system deployments.
- Subject Added Entry-Topical Term
- Internships.
- Subject Added Entry-Topical Term
- Ablation.
- Subject Added Entry-Topical Term
- Verbal communication.
- Subject Added Entry-Topical Term
- Communication.
- Added Entry-Corporate Name
- Stanford University.
- Host Item Entry
- Dissertations Abstracts International. 84-12A.
- Host Item Entry
- Dissertation Abstract International
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:640527
MARC
008240220s2023 ulk 00 kor■001000016931978
■00520240214100356
■006m o d
■007cr#unu||||||||
■020 ▼a9798379652869
■035 ▼a(MiAaPQ)AAI30462697
■035 ▼a(MiAaPQ)STANFORDzs617qp8476
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a300
■1001 ▼aSethi, Geet.
■24510▼aData-Driven Statistical Sharding for Industry-Scale Neural Recommendation▼h[electronic resource]
■260 ▼a[S.l.]▼bStanford University. ▼c2023
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2023
■300 ▼a1 online resource(111 p.)
■500 ▼aSource: Dissertations Abstracts International, Volume: 84-12, Section: A.
■500 ▼aAdvisor: Trippel, Caroline;Wu, Carole-Jean;Kozyrakis, Christos.
■5021 ▼aThesis (Ph.D.)--Stanford University, 2023.
■506 ▼aThis item must not be sold to any third party vendors.
■520 ▼aDeep learning based recommendation models (DLRMs) form the backbone of many internet-scale services such as web search, social media, and video streaming. Primarily composed of massive embedding tables, potentially terabytes in size, these models require immense system resources to train and the solving of the sharding problem. The sharding problem is the task of partitioning and placing the embedding table parameters throughout the target system memory topology such that training throughput is maximized.This dissertation: (1) Characterizes and derives statistics from DLRM training data which can be used to accurately and granularly predict the memory demands of individual embedding table rows; (2) Presents RecShard, a mixed-integer linear program based approach which uses these statistics to solve the sharding problem for capacity constrained single-node systems, where parameters must be placed across high-performance GPU HBM and much slower CPU DRAM; reducing accesses to the latter by orders of magnitude; and (3) Presents FlexShard, a precise row-level sharding algorithm which focuses on sharding emerging sequence-based DLRMs across multi-node GPU training clusters; leveraging these statistics to significantly reduce inter-node communication demand, the bottleneck of scale-out DLRM training.The size of industry-scale DLRMs requires sharding to be performed; however the skewed power-law nature of DLRM training data causes imprecise partitioning and placement decisions to result in imbalanced load across the system memory topology. The contributions of this dissertation provide a foundation upon which one can reason about the access patterns to fine-grained regions of DLRM memory; as well as two novel sharding techniques built upon this foundation. These techniques demonstrate significant improvements over the prior state-of-the-art on real-world production data and system deployments.
■590 ▼aSchool code: 0212.
■650 4▼aInternships.
■650 4▼aAblation.
■650 4▼aVerbal communication.
■650 4▼aCommunication.
■690 ▼a0459
■71020▼aStanford University.
■7730 ▼tDissertations Abstracts International▼g84-12A.
■773 ▼tDissertation Abstract International
■790 ▼a0212
■791 ▼aPh.D.
■792 ▼a2023
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16931978▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.
■980 ▼a202402▼f2024
미리보기
내보내기
chatGPT토론
Ai 추천 관련 도서
ค้นหาข้อมูลรายละเอียด
- จองห้องพัก
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- โฟลเดอร์ของฉัน