중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보

Toward Understanding the Dynamics of Over-Parameterized Neural Networks.

자료유형: 학위논문

Control Number: 0017161696

International Standard Book Number: 9798383207666

Dewey Decimal Classification Number: 004

Main Entry-Personal Name: Zhu, Libin.

Publication, Distribution, etc. (Imprint: [S.l.] : University of California, San Diego., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 212 p.

General Note: Source: Dissertations Abstracts International, Volume: 86-01, Section: B.

General Note: Advisor: Belkin, Mikhail.

Dissertation Note: Thesis (Ph.D.)--University of California, San Diego, 2024.

Summary, Etc.: 요약The practical applications of neural networks are vast and varied, yet a comprehensive understanding of their underlying principles remains incomplete. This dissertation advances the theoretical understanding of neural networks, with a particular focus on over-parameterized models. It investigates their optimization and generalization dynamics and sheds light on various deep-learning phenomena observed in practice. This research deepens the understanding of the complex behaviors of these models and establishes theoretical insights that closely align with their empirical behaviours across diverse computational tasks.In the first part of the thesis, we analyze the fundamental properties of over-parameterized neural networks and we demonstrate that these properties can lead to the success of their optimization. We show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity. The transition to linearity is characterized by the networks converging to their first-order Taylor expansion of parameters as their "width'' approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. We further demonstrate that the property of transition to linearity plays an important role in the success of the optimization of over-parameterized neural networks.In this second part of the thesis, we investigate the modern training regime of over-parameterized neural networks, particularly focusing on the large learning rate regime. While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase'' [65] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.Moreover, we extend the analysis of catapult dynamics to stochastic gradient descent (SGD). We first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with SGD. We provide evidence that the spikes in the training loss of SGD are caused by catapults. Second, we posit an explanation for how catapults lead to better generalization by demonstrating that catapults increase feature learning by increasing alignment with the Average Gradient Outer Product (AGOP) of the true predictor. Furthermore, we demonstrate that a smaller batch size in SGD induces a larger number of catapults, thereby improving AGOP alignment and test performance.Overall, by integrating theoretical insights with empirical validations, this dissertation provides a new understanding of the complex dynamics governing neural network training and generalization.

Subject Added Entry-Topical Term: Computer science.

Subject Added Entry-Topical Term: Computer engineering.

Index Term-Uncontrolled: Catapult dynamics

Index Term-Uncontrolled: Feature learning

Index Term-Uncontrolled: Neural networks

Index Term-Uncontrolled: Quadratic models

Index Term-Uncontrolled: Transition to linearity

Added Entry-Corporate Name: University of California, San Diego Computer Science and Engineering

Host Item Entry: Dissertations Abstracts International. 86-01B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:654051

008250224s2024        us  ||||||||||||||c||eng  d
■001000017161696
■00520250211151432
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798383207666
■035    ▼a(MiAaPQ)AAI31295384
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a004
■1001  ▼aZhu,  Libin.
■24510▼aToward  Understanding  the  Dynamics  of  Over-Parameterized  Neural  Networks.
■260    ▼a[S.l.]▼bUniversity  of  California,  San  Diego.  ▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a212  p.
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  86-01,  Section:  B.
■500    ▼aAdvisor:  Belkin,  Mikhail.
■5021  ▼aThesis  (Ph.D.)--University  of  California,  San  Diego,  2024.
■520    ▼aThe  practical  applications  of  neural  networks  are  vast  and  varied,  yet  a  comprehensive  understanding  of  their  underlying  principles  remains  incomplete.  This  dissertation  advances  the  theoretical  understanding  of  neural  networks,  with  a  particular  focus  on  over-parameterized  models.  It  investigates  their  optimization  and  generalization  dynamics  and  sheds  light  on  various  deep-learning  phenomena  observed  in  practice.  This  research  deepens  the  understanding  of  the  complex  behaviors  of  these  models  and  establishes  theoretical  insights  that  closely  align  with  their  empirical  behaviours  across  diverse  computational  tasks.In  the  first  part  of  the  thesis,  we  analyze  the  fundamental  properties  of  over-parameterized  neural  networks  and  we  demonstrate  that  these  properties  can  lead  to  the  success  of  their  optimization.  We  show  that  feedforward  neural  networks  corresponding  to  arbitrary  directed  acyclic  graphs  undergo  transition  to  linearity.  The  transition  to  linearity  is  characterized  by  the  networks  converging  to  their  first-order  Taylor  expansion  of  parameters  as  their  "width''  approaches  infinity.  The  width  of  these  general  networks  is  characterized  by  the  minimum  indegree  of  their  neurons,  except  for  the  input  and  first  layers.  We  further  demonstrate  that  the  property  of  transition  to  linearity  plays  an  important  role  in  the  success  of  the  optimization  of  over-parameterized  neural  networks.In  this  second  part  of  the  thesis,  we  investigate  the  modern  training  regime  of  over-parameterized  neural  networks,  particularly  focusing  on  the  large  learning  rate  regime.  While  neural  networks  can  be  approximated  by  linear  models  as  their  width  increases,  certain  properties  of  wide  neural  networks  cannot  be  captured  by  linear  models.  We  show  that  recently  proposed  Neural  Quadratic  Models  can  exhibit  the  "catapult  phase''  [65]  that  arises  when  training  such  models  with  large  learning  rates.  We  then  empirically  show  that  the  behaviour  of  neural  quadratic  models  parallels  that  of  neural  networks  in  generalization,  especially  in  the  catapult  phase  regime.  Our  analysis  further  demonstrates  that  quadratic  models  can  be  an  effective  tool  for  analysis  of  neural  networks.Moreover,  we  extend  the  analysis  of  catapult  dynamics  to  stochastic  gradient  descent  (SGD).  We  first  present  an  explanation  regarding  the  common  occurrence  of  spikes  in  the  training  loss  when  neural  networks  are  trained  with  SGD.  We  provide  evidence  that  the  spikes  in  the  training  loss  of  SGD  are  caused  by  catapults.  Second,  we  posit  an  explanation  for  how  catapults  lead  to  better  generalization  by  demonstrating  that  catapults  increase  feature  learning  by  increasing  alignment  with  the  Average  Gradient  Outer  Product  (AGOP)  of  the  true  predictor.  Furthermore,  we  demonstrate  that  a  smaller  batch  size  in  SGD  induces  a  larger  number  of  catapults,  thereby  improving  AGOP  alignment  and  test  performance.Overall,  by  integrating  theoretical  insights  with  empirical  validations,  this  dissertation  provides  a  new  understanding  of  the  complex  dynamics  governing  neural  network  training  and  generalization.
■590    ▼aSchool  code:  0033.
■650  4▼aComputer  science.
■650  4▼aComputer  engineering.
■653    ▼aCatapult  dynamics
■653    ▼aFeature  learning
■653    ▼aNeural  networks
■653    ▼aQuadratic  models
■653    ▼aTransition  to  linearity
■690    ▼a0984
■690    ▼a0464
■690    ▼a0800
■71020▼aUniversity  of  California,  San  Diego▼bComputer  Science  and  Engineering.
■7730  ▼tDissertations  Abstracts  International▼g86-01B.
■790    ▼a0033
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161696▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Бронирование
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
моя папка

материал
Reg No.	Количество платежных	Местоположение	статус	Ленд информации
TQ0034023	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* Бронирование доступны в заимствований книги. Чтобы сделать предварительный заказ, пожалуйста, нажмите кнопку бронирование

본문

서브메뉴

검색

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

New Books MORE

Related books MORE

최근 3년간 통계입니다.

Подробнее информация.

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK