본문

서브메뉴

Towards Fast Convergence and High Quality for Training Deep Neural Networks- [electronic resource]
내용보기
Towards Fast Convergence and High Quality for Training Deep Neural Networks- [electronic resource]
자료유형  
 학위논문
Control Number  
0016931606
International Standard Book Number  
9798379710620
Dewey Decimal Classification Number  
621.3
Main Entry-Personal Name  
Hao, Zhiyong.
Publication, Distribution, etc. (Imprint  
[S.l.] : Cornell University., 2023
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2023
Physical Description  
1 online resource(159 p.)
General Note  
Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
General Note  
Advisor: Chiang, Hsiao-Dong.
Dissertation Note  
Thesis (Ph.D.)--Cornell University, 2023.
Restrictions on Access Note  
This item must not be sold to any third party vendors.
Summary, Etc.  
요약Recent leap on deep learning relies heavily on efficient optimization methods, emerging technologies that enables big data, as well as evolving capacity of deep neural network architectures. Training a deep learning model is time consuming, which blocks many real-world applications from finding the most accurate model in a reasonable time. In this thesis, we address multiple factors that influence deep neural network training efficiency and model accuracy. On optimization side, stochastic gradient descent (SGD) method that partitions training data into batches enables efficient training of neural networks. But local solvers sometimes suffer from converging to sub-optimal solutions and rely on initialization and hyperparameters. This is a common issue in non-convex optimization problems including training deep neural networks. In data perspective, sample size also influences training quality and efficiency. The goal is to train a machine learning model with as few samples as possible without loss of performance. As for the model architecture, designing efficient architecture means extracting fewer redundant features, and therefore providing higher prediction accuracy with the same parameter capacity.First, a systematical method for finding multiple high-quality local optimal deep neural networks, based on the TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) method, is introduced. Our goal is to systematically search for multiple local optimal parameters for large models, like deep neural networks, trained on large datasets. To achieve this, a dynamic search path (DSP) method is proposed to provide improved search guidance used in the TRUST-TECH. By integrating the DSP method with the TRUST-TECH method (DSP-TT), multiple optimal training solutions with higher quality than randomly initialized ones can be obtained. To take advantage of these optimal solutions, a DSP-TT Ensemble method is further developed. Experiments on various test cases show that the proposed DSPTT method achieves considerable improvement over other ensemble methods developed for deep architectures. The DSP-TT ensemble method also shows diversity advantages over other ensemble methods.Second, we propose the Conjugate Gradient with Quadratic line-search (CGQ) method to address the sensitivity issue of SGD to hyperparameters. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of this method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.Third, a novel data curriculum learning method is proposed to deal with both convergence speed and the generalization capability. On the one hand, a novel metric is developed that prioritizes the samples that contribute most to the convergence speed. On the other hand, a regularization method is designed to make CL methods generalize well on test data. It serves as an auxiliary objective that regularizes the model outputs with "Recaps" at the beginning of each training phase that contain the knowledge learned from the last training phase. Moreover, multiple prediction heads are proposed to be placed in key layers of a neural network during training. These heads speeds up the convergence for layers far from the output by shortening the back-propagation path for deep architectures. Numerical experiments on multiple image classification test systems show that the CL framework equipped with the three schemes presented in this chapter outperforms other CL methods by a significant margin.Finally, we investigate ways to improve efficiency of parameter space. On the one hand, we treat neural network parameters independently, so that pruning and quantization can be applied. At this level, we proposed unstructured hierarchical training method that groups parameters by their values. On the other hand, each neural network consists of neurons, so that parameter efficiency can be addressed by enhancing neuron diversity. At neuron level, we explore hierarchical training approach, dropout approach, and direct regularization approach, respectively.
Subject Added Entry-Topical Term  
Electrical engineering.
Subject Added Entry-Topical Term  
Computer engineering.
Index Term-Uncontrolled  
Stochastic gradient descent
Index Term-Uncontrolled  
Neural network
Index Term-Uncontrolled  
Dynamic search path
Index Term-Uncontrolled  
Deep architectures
Index Term-Uncontrolled  
Parameter space
Added Entry-Corporate Name  
Cornell University Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 84-12B.
Host Item Entry  
Dissertation Abstract International
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:642637
신착도서 더보기
최근 3년간 통계입니다.

소장정보

  • 예약
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • 나의폴더
소장자료
등록번호 청구기호 소장처 대출가능여부 대출정보
TQ0028553 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

해당 도서를 다른 이용자가 함께 대출한 도서

관련도서

관련 인기도서

도서위치