서브메뉴
검색
Towards Fast Convergence and High Quality for Training Deep Neural Networks- [electronic resource]
Towards Fast Convergence and High Quality for Training Deep Neural Networks- [electronic resource]
- 자료유형
- 학위논문
- Control Number
- 0016931606
- International Standard Book Number
- 9798379710620
- Dewey Decimal Classification Number
- 621.3
- Main Entry-Personal Name
- Hao, Zhiyong.
- Publication, Distribution, etc. (Imprint
- [S.l.] : Cornell University., 2023
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2023
- Physical Description
- 1 online resource(159 p.)
- General Note
- Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
- General Note
- Advisor: Chiang, Hsiao-Dong.
- Dissertation Note
- Thesis (Ph.D.)--Cornell University, 2023.
- Restrictions on Access Note
- This item must not be sold to any third party vendors.
- Summary, Etc.
- 요약Recent leap on deep learning relies heavily on efficient optimization methods, emerging technologies that enables big data, as well as evolving capacity of deep neural network architectures. Training a deep learning model is time consuming, which blocks many real-world applications from finding the most accurate model in a reasonable time. In this thesis, we address multiple factors that influence deep neural network training efficiency and model accuracy. On optimization side, stochastic gradient descent (SGD) method that partitions training data into batches enables efficient training of neural networks. But local solvers sometimes suffer from converging to sub-optimal solutions and rely on initialization and hyperparameters. This is a common issue in non-convex optimization problems including training deep neural networks. In data perspective, sample size also influences training quality and efficiency. The goal is to train a machine learning model with as few samples as possible without loss of performance. As for the model architecture, designing efficient architecture means extracting fewer redundant features, and therefore providing higher prediction accuracy with the same parameter capacity.First, a systematical method for finding multiple high-quality local optimal deep neural networks, based on the TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) method, is introduced. Our goal is to systematically search for multiple local optimal parameters for large models, like deep neural networks, trained on large datasets. To achieve this, a dynamic search path (DSP) method is proposed to provide improved search guidance used in the TRUST-TECH. By integrating the DSP method with the TRUST-TECH method (DSP-TT), multiple optimal training solutions with higher quality than randomly initialized ones can be obtained. To take advantage of these optimal solutions, a DSP-TT Ensemble method is further developed. Experiments on various test cases show that the proposed DSPTT method achieves considerable improvement over other ensemble methods developed for deep architectures. The DSP-TT ensemble method also shows diversity advantages over other ensemble methods.Second, we propose the Conjugate Gradient with Quadratic line-search (CGQ) method to address the sensitivity issue of SGD to hyperparameters. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of this method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.Third, a novel data curriculum learning method is proposed to deal with both convergence speed and the generalization capability. On the one hand, a novel metric is developed that prioritizes the samples that contribute most to the convergence speed. On the other hand, a regularization method is designed to make CL methods generalize well on test data. It serves as an auxiliary objective that regularizes the model outputs with "Recaps" at the beginning of each training phase that contain the knowledge learned from the last training phase. Moreover, multiple prediction heads are proposed to be placed in key layers of a neural network during training. These heads speeds up the convergence for layers far from the output by shortening the back-propagation path for deep architectures. Numerical experiments on multiple image classification test systems show that the CL framework equipped with the three schemes presented in this chapter outperforms other CL methods by a significant margin.Finally, we investigate ways to improve efficiency of parameter space. On the one hand, we treat neural network parameters independently, so that pruning and quantization can be applied. At this level, we proposed unstructured hierarchical training method that groups parameters by their values. On the other hand, each neural network consists of neurons, so that parameter efficiency can be addressed by enhancing neuron diversity. At neuron level, we explore hierarchical training approach, dropout approach, and direct regularization approach, respectively.
- Subject Added Entry-Topical Term
- Electrical engineering.
- Subject Added Entry-Topical Term
- Computer engineering.
- Index Term-Uncontrolled
- Stochastic gradient descent
- Index Term-Uncontrolled
- Neural network
- Index Term-Uncontrolled
- Dynamic search path
- Index Term-Uncontrolled
- Deep architectures
- Index Term-Uncontrolled
- Parameter space
- Added Entry-Corporate Name
- Cornell University Electrical and Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 84-12B.
- Host Item Entry
- Dissertation Abstract International
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:642637