서브메뉴
검색
Toward Understanding the Dynamics of Over-Parameterized Neural Networks.
Toward Understanding the Dynamics of Over-Parameterized Neural Networks.
상세정보
- 자료유형
- 학위논문
- Control Number
- 0017161696
- International Standard Book Number
- 9798383207666
- Dewey Decimal Classification Number
- 004
- Main Entry-Personal Name
- Zhu, Libin.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of California, San Diego., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 212 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-01, Section: B.
- General Note
- Advisor: Belkin, Mikhail.
- Dissertation Note
- Thesis (Ph.D.)--University of California, San Diego, 2024.
- Summary, Etc.
- 요약The practical applications of neural networks are vast and varied, yet a comprehensive understanding of their underlying principles remains incomplete. This dissertation advances the theoretical understanding of neural networks, with a particular focus on over-parameterized models. It investigates their optimization and generalization dynamics and sheds light on various deep-learning phenomena observed in practice. This research deepens the understanding of the complex behaviors of these models and establishes theoretical insights that closely align with their empirical behaviours across diverse computational tasks.In the first part of the thesis, we analyze the fundamental properties of over-parameterized neural networks and we demonstrate that these properties can lead to the success of their optimization. We show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity. The transition to linearity is characterized by the networks converging to their first-order Taylor expansion of parameters as their "width'' approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. We further demonstrate that the property of transition to linearity plays an important role in the success of the optimization of over-parameterized neural networks.In this second part of the thesis, we investigate the modern training regime of over-parameterized neural networks, particularly focusing on the large learning rate regime. While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase'' [65] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.Moreover, we extend the analysis of catapult dynamics to stochastic gradient descent (SGD). We first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with SGD. We provide evidence that the spikes in the training loss of SGD are caused by catapults. Second, we posit an explanation for how catapults lead to better generalization by demonstrating that catapults increase feature learning by increasing alignment with the Average Gradient Outer Product (AGOP) of the true predictor. Furthermore, we demonstrate that a smaller batch size in SGD induces a larger number of catapults, thereby improving AGOP alignment and test performance.Overall, by integrating theoretical insights with empirical validations, this dissertation provides a new understanding of the complex dynamics governing neural network training and generalization.
- Subject Added Entry-Topical Term
- Computer science.
- Subject Added Entry-Topical Term
- Computer engineering.
- Index Term-Uncontrolled
- Catapult dynamics
- Index Term-Uncontrolled
- Feature learning
- Index Term-Uncontrolled
- Neural networks
- Index Term-Uncontrolled
- Quadratic models
- Index Term-Uncontrolled
- Transition to linearity
- Added Entry-Corporate Name
- University of California, San Diego Computer Science and Engineering
- Host Item Entry
- Dissertations Abstracts International. 86-01B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:654051
MARC
008250224s2024 us ||||||||||||||c||eng d■001000017161696
■00520250211151432
■006m o d
■007cr#unu||||||||
■020 ▼a9798383207666
■035 ▼a(MiAaPQ)AAI31295384
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a004
■1001 ▼aZhu, Libin.
■24510▼aToward Understanding the Dynamics of Over-Parameterized Neural Networks.
■260 ▼a[S.l.]▼bUniversity of California, San Diego. ▼c2024
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2024
■300 ▼a212 p.
■500 ▼aSource: Dissertations Abstracts International, Volume: 86-01, Section: B.
■500 ▼aAdvisor: Belkin, Mikhail.
■5021 ▼aThesis (Ph.D.)--University of California, San Diego, 2024.
■520 ▼aThe practical applications of neural networks are vast and varied, yet a comprehensive understanding of their underlying principles remains incomplete. This dissertation advances the theoretical understanding of neural networks, with a particular focus on over-parameterized models. It investigates their optimization and generalization dynamics and sheds light on various deep-learning phenomena observed in practice. This research deepens the understanding of the complex behaviors of these models and establishes theoretical insights that closely align with their empirical behaviours across diverse computational tasks.In the first part of the thesis, we analyze the fundamental properties of over-parameterized neural networks and we demonstrate that these properties can lead to the success of their optimization. We show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity. The transition to linearity is characterized by the networks converging to their first-order Taylor expansion of parameters as their "width'' approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. We further demonstrate that the property of transition to linearity plays an important role in the success of the optimization of over-parameterized neural networks.In this second part of the thesis, we investigate the modern training regime of over-parameterized neural networks, particularly focusing on the large learning rate regime. While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase'' [65] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.Moreover, we extend the analysis of catapult dynamics to stochastic gradient descent (SGD). We first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with SGD. We provide evidence that the spikes in the training loss of SGD are caused by catapults. Second, we posit an explanation for how catapults lead to better generalization by demonstrating that catapults increase feature learning by increasing alignment with the Average Gradient Outer Product (AGOP) of the true predictor. Furthermore, we demonstrate that a smaller batch size in SGD induces a larger number of catapults, thereby improving AGOP alignment and test performance.Overall, by integrating theoretical insights with empirical validations, this dissertation provides a new understanding of the complex dynamics governing neural network training and generalization.
■590 ▼aSchool code: 0033.
■650 4▼aComputer science.
■650 4▼aComputer engineering.
■653 ▼aCatapult dynamics
■653 ▼aFeature learning
■653 ▼aNeural networks
■653 ▼aQuadratic models
■653 ▼aTransition to linearity
■690 ▼a0984
■690 ▼a0464
■690 ▼a0800
■71020▼aUniversity of California, San Diego▼bComputer Science and Engineering.
■7730 ▼tDissertations Abstracts International▼g86-01B.
■790 ▼a0033
■791 ▼aPh.D.
■792 ▼a2024
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161696▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.
미리보기
내보내기
chatGPT토론
Ai 추천 관련 도서
Подробнее информация.
- Бронирование
- 캠퍼스간 도서대출
- 서가에 없는 책 신고
- моя папка