본문

서브메뉴

Data-Driven Statistical Sharding for Industry-Scale Neural Recommendation- [electronic resource]
Data-Driven Statistical Sharding for Industry-Scale Neural Recommendation- [electronic resource]

상세정보

자료유형  
 학위논문
Control Number  
0016931978
International Standard Book Number  
9798379652869
Dewey Decimal Classification Number  
300
Main Entry-Personal Name  
Sethi, Geet.
Publication, Distribution, etc. (Imprint  
[S.l.] : Stanford University., 2023
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2023
Physical Description  
1 online resource(111 p.)
General Note  
Source: Dissertations Abstracts International, Volume: 84-12, Section: A.
General Note  
Advisor: Trippel, Caroline;Wu, Carole-Jean;Kozyrakis, Christos.
Dissertation Note  
Thesis (Ph.D.)--Stanford University, 2023.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약Deep learning based recommendation models (DLRMs) form the backbone of many internet-scale services such as web search, social media, and video streaming. Primarily composed of massive embedding tables, potentially terabytes in size, these models require immense system resources to train and the solving of the sharding problem. The sharding problem is the task of partitioning and placing the embedding table parameters throughout the target system memory topology such that training throughput is maximized.This dissertation: (1) Characterizes and derives statistics from DLRM training data which can be used to accurately and granularly predict the memory demands of individual embedding table rows; (2) Presents RecShard, a mixed-integer linear program based approach which uses these statistics to solve the sharding problem for capacity constrained single-node systems, where parameters must be placed across high-performance GPU HBM and much slower CPU DRAM; reducing accesses to the latter by orders of magnitude; and (3) Presents FlexShard, a precise row-level sharding algorithm which focuses on sharding emerging sequence-based DLRMs across multi-node GPU training clusters; leveraging these statistics to significantly reduce inter-node communication demand, the bottleneck of scale-out DLRM training.The size of industry-scale DLRMs requires sharding to be performed; however the skewed power-law nature of DLRM training data causes imprecise partitioning and placement decisions to result in imbalanced load across the system memory topology. The contributions of this dissertation provide a foundation upon which one can reason about the access patterns to fine-grained regions of DLRM memory; as well as two novel sharding techniques built upon this foundation. These techniques demonstrate significant improvements over the prior state-of-the-art on real-world production data and system deployments.
Subject Added Entry-Topical Term  
Internships.
Subject Added Entry-Topical Term  
Ablation.
Subject Added Entry-Topical Term  
Verbal communication.
Subject Added Entry-Topical Term  
Communication.
Added Entry-Corporate Name  
Stanford University.
Host Item Entry  
Dissertations Abstracts International. 84-12A.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:640527

MARC

 008240220s2023        ulk                      00        kor
■001000016931978
■00520240214100356
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798379652869
■035    ▼a(MiAaPQ)AAI30462697
■035    ▼a(MiAaPQ)STANFORDzs617qp8476
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a300
■1001  ▼aSethi,  Geet.
■24510▼aData-Driven  Statistical  Sharding  for  Industry-Scale  Neural  Recommendation▼h[electronic  resource]
■260    ▼a[S.l.]▼bStanford  University.  ▼c2023
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2023
■300    ▼a1  online  resource(111  p.)
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  84-12,  Section:  A.
■500    ▼aAdvisor:  Trippel,  Caroline;Wu,  Carole-Jean;Kozyrakis,  Christos.
■5021  ▼aThesis  (Ph.D.)--Stanford  University,  2023.
■506    ▼aThis  item  must  not  be  sold  to  any  third  party  vendors.
■520    ▼aDeep  learning  based  recommendation  models  (DLRMs)  form  the  backbone  of  many  internet-scale  services  such  as  web  search,  social  media,  and  video  streaming.  Primarily  composed  of  massive  embedding  tables,  potentially  terabytes  in  size,  these  models  require  immense  system  resources  to  train  and  the  solving  of  the  sharding  problem.  The  sharding  problem  is  the  task  of  partitioning  and  placing  the  embedding  table  parameters  throughout  the  target  system  memory  topology  such  that  training  throughput  is  maximized.This  dissertation:  (1)  Characterizes  and  derives  statistics  from  DLRM  training  data  which  can  be  used  to  accurately  and  granularly  predict  the  memory  demands  of  individual  embedding  table  rows;  (2)  Presents  RecShard,  a  mixed-integer  linear  program  based  approach  which  uses  these  statistics  to  solve  the  sharding  problem  for  capacity  constrained  single-node  systems,  where  parameters  must  be  placed  across  high-performance  GPU  HBM  and  much  slower  CPU  DRAM;  reducing  accesses  to  the  latter  by  orders  of  magnitude;  and  (3)  Presents  FlexShard,  a  precise  row-level  sharding  algorithm  which  focuses  on  sharding  emerging  sequence-based  DLRMs  across  multi-node  GPU  training  clusters;  leveraging  these  statistics  to  significantly  reduce  inter-node  communication  demand,  the  bottleneck  of  scale-out  DLRM  training.The  size  of  industry-scale  DLRMs  requires  sharding  to  be  performed;  however  the  skewed  power-law  nature  of  DLRM  training  data  causes  imprecise  partitioning  and  placement  decisions  to  result  in  imbalanced  load  across  the  system  memory  topology.  The  contributions  of  this  dissertation  provide  a  foundation  upon  which  one  can  reason  about  the  access  patterns  to  fine-grained  regions  of  DLRM  memory;  as  well  as  two  novel  sharding  techniques  built  upon  this  foundation.  These  techniques  demonstrate  significant  improvements  over  the  prior  state-of-the-art  on  real-world  production  data  and  system  deployments.
■590    ▼aSchool  code:  0212.
■650  4▼aInternships.
■650  4▼aAblation.
■650  4▼aVerbal  communication.
■650  4▼aCommunication.
■690    ▼a0459
■71020▼aStanford  University.
■7730  ▼tDissertations  Abstracts  International▼g84-12A.
■773    ▼tDissertation  Abstract  International
■790    ▼a0212
■791    ▼aPh.D.
■792    ▼a2023
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T16931978▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.
■980    ▼a202402▼f2024

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    New Books MORE
    Related books MORE
    최근 3년간 통계입니다.

    ค้นหาข้อมูลรายละเอียด

    • จองห้องพัก
    • 캠퍼스간 도서대출
    • 서가에 없는 책 신고
    • โฟลเดอร์ของฉัน
    วัสดุ
    Reg No. Call No. ตำแหน่งที่ตั้ง สถานะ ยืมข้อมูล
    TQ0026447 T   원문자료 열람가능/출력가능 열람가능/출력가능
    마이폴더 부재도서신고

    * จองมีอยู่ในหนังสือยืม เพื่อให้การสำรองที่นั่งคลิกที่ปุ่มจองห้องพัก

    해당 도서를 다른 이용자가 함께 대출한 도서

    Related books

    Related Popular Books

    도서위치