중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Towards Fast Convergence and High Quality for Training Deep Neural Networks- [electronic resource]

자료유형: 학위논문

Control Number: 0016931606

International Standard Book Number: 9798379710620

Dewey Decimal Classification Number: 621.3

Main Entry-Personal Name: Hao, Zhiyong.

Publication, Distribution, etc. (Imprint: [S.l.] : Cornell University., 2023

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2023

Physical Description: 1 online resource(159 p.)

General Note: Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

General Note: Advisor: Chiang, Hsiao-Dong.

Dissertation Note: Thesis (Ph.D.)--Cornell University, 2023.

Restrictions on Access Note: This item must not be sold to any third party vendors.

Summary, Etc.: 요약Recent leap on deep learning relies heavily on efficient optimization methods, emerging technologies that enables big data, as well as evolving capacity of deep neural network architectures. Training a deep learning model is time consuming, which blocks many real-world applications from finding the most accurate model in a reasonable time. In this thesis, we address multiple factors that influence deep neural network training efficiency and model accuracy. On optimization side, stochastic gradient descent (SGD) method that partitions training data into batches enables efficient training of neural networks. But local solvers sometimes suffer from converging to sub-optimal solutions and rely on initialization and hyperparameters. This is a common issue in non-convex optimization problems including training deep neural networks. In data perspective, sample size also influences training quality and efficiency. The goal is to train a machine learning model with as few samples as possible without loss of performance. As for the model architecture, designing efficient architecture means extracting fewer redundant features, and therefore providing higher prediction accuracy with the same parameter capacity.First, a systematical method for finding multiple high-quality local optimal deep neural networks, based on the TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) method, is introduced. Our goal is to systematically search for multiple local optimal parameters for large models, like deep neural networks, trained on large datasets. To achieve this, a dynamic search path (DSP) method is proposed to provide improved search guidance used in the TRUST-TECH. By integrating the DSP method with the TRUST-TECH method (DSP-TT), multiple optimal training solutions with higher quality than randomly initialized ones can be obtained. To take advantage of these optimal solutions, a DSP-TT Ensemble method is further developed. Experiments on various test cases show that the proposed DSPTT method achieves considerable improvement over other ensemble methods developed for deep architectures. The DSP-TT ensemble method also shows diversity advantages over other ensemble methods.Second, we propose the Conjugate Gradient with Quadratic line-search (CGQ) method to address the sensitivity issue of SGD to hyperparameters. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of this method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.Third, a novel data curriculum learning method is proposed to deal with both convergence speed and the generalization capability. On the one hand, a novel metric is developed that prioritizes the samples that contribute most to the convergence speed. On the other hand, a regularization method is designed to make CL methods generalize well on test data. It serves as an auxiliary objective that regularizes the model outputs with "Recaps" at the beginning of each training phase that contain the knowledge learned from the last training phase. Moreover, multiple prediction heads are proposed to be placed in key layers of a neural network during training. These heads speeds up the convergence for layers far from the output by shortening the back-propagation path for deep architectures. Numerical experiments on multiple image classification test systems show that the CL framework equipped with the three schemes presented in this chapter outperforms other CL methods by a significant margin.Finally, we investigate ways to improve efficiency of parameter space. On the one hand, we treat neural network parameters independently, so that pruning and quantization can be applied. At this level, we proposed unstructured hierarchical training method that groups parameters by their values. On the other hand, each neural network consists of neurons, so that parameter efficiency can be addressed by enhancing neuron diversity. At neuron level, we explore hierarchical training approach, dropout approach, and direct regularization approach, respectively.

Subject Added Entry-Topical Term: Electrical engineering.

Subject Added Entry-Topical Term: Computer engineering.

Index Term-Uncontrolled: Stochastic gradient descent

Index Term-Uncontrolled: Neural network

Index Term-Uncontrolled: Dynamic search path

Index Term-Uncontrolled: Deep architectures

Index Term-Uncontrolled: Parameter space

Added Entry-Corporate Name: Cornell University Electrical and Computer Engineering

Host Item Entry: Dissertations Abstracts International. 84-12B.

Host Item Entry: Dissertation Abstract International

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:642637

신착도서 더보기

최근 3년간 통계입니다.

예약
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
나의폴더

소장자료
등록번호	청구기호	소장처	대출가능여부	대출정보
TQ0028553	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

본문

서브메뉴

검색

신착도서 더보기

최근 3년간 통계입니다.

소장정보

해당 도서를 다른 이용자가 함께 대출한 도서

관련도서

관련 인기도서

도서위치

QUICK LINK