중부대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

Contents Info

Domain-Specific Acceleration: From Efficient Vision Processing Hardware to High-Performance Quantum Computing Software.

자료유형: 학위논문

Control Number: 0017164361

International Standard Book Number: 9798384042075

Dewey Decimal Classification Number: 530.1

Main Entry-Personal Name: Zhang, Qirui.

Publication, Distribution, etc. (Imprint: [S.l.] : University of Michigan., 2024

Publication, Distribution, etc. (Imprint: Ann Arbor : ProQuest Dissertations & Theses, 2024

Physical Description: 112 p.

General Note: Source: Dissertations Abstracts International, Volume: 86-03, Section: B.

General Note: Advisor: Sylvester, Dennis.

Dissertation Note: Thesis (Ph.D.)--University of Michigan, 2024.

Summary, Etc.: 요약With the end of Dennard scaling and the decline of Moore's law, there are no longer 'free' performance and efficiency gains from semiconductor technology advancements. Domain-Specific Acceleration (DSA) is a promising remaining path for further significant improvements. This approach involves designing optimized software and hardware tailored to specific application domains. Successful DSA requires careful consideration of methodologies such as specialization, parallelism exploitation, algorithm-hardware co-design, and balancing efficiency with programmability. To extend the boundaries of DSA, especially for less extensively studied domains, this dissertation studies DSA designs for three application areas: Image compression, robotic vision, and Quantum Circuit Simulation (QCS). Though these domains differ, the three designs employ a common methodology of algorithm-hardware co-optimizations to reduce data movements from memory to the processing units.Firstly, this dissertation presents an Ultra-Low-Power (ULP) H.264 or Advanced Video Coding (AVC) intra-frame image compression accelerator for event-driven Internet of Things (IoT) imaging systems. The H.264/AVC intra-frame codec is customized to compress arbitrary non-rectangular change-detected regions. Novel algorithm-hardware co-designs optimize energy and latency from image memory accesses, reducing overhead for neighbor macroblock accesses by 2.6x with negligible quality loss. Split control for major processing phases exploits data dependency and pipelining, while data path micro-architecture reconfiguration reduces area and leakage. Fabricated in 40nm, the accelerator occupies 0.32mm2 with 4kB SRAM, consuming only 1.21μW at 0.6V and 153kHz, achieving 30.9pJ/pixel compression energy efficiency. Combined with change detection, this design brings a 133x reduction in overall energy for egressing images of change-detected regions in an event-driven IoT imaging system.Secondly, this dissertation introduces RoboVisio, an efficient and flexible domain-specific System-on-Chip (SoC) for vision tasks in autonomous micro-robot navigation. A novel hybrid Processing Element (PE) is proposed, combining a 2D-mapping architecture for classic vision tasks with an output-channel-parallel systolic architecture for Convolutional Neural Network (CNN). This integration future-proofs the architecture, facilitating next-generation CNN-heavy vision algorithms, saving 40% in area and leakage without power or throughput loss compared to separate implementations. Other key features include 2MB magnetoresistive random-access memory for non-volatile fully-on-chip weight storage, a unified image-activation memory with block-swapping-based buffering that reduces buffer footprint by 50% and eliminates data copy for multi-frame buffering, and a combination of weight buffering and CNN loop ordering reducing weight memory system power by 75%. Fabricated in 22nm, RoboVisio achieves 0.22nJ/pix for Harris corner detection and 3.5TOPS/W (16-bit OP) for CNN, a 40% to 170% efficiency improvement over state-of-the-art edge machine learning SoCs using non-volatile memory.Lastly, this dissertation examines the acceleration of QCS, a crucial computational problem for quantum computing development. Predominant approaches center on Tensor Network (TN), valued for better concurrency and reduced computation compared to full quantum vectors and matrices. However, even with the advantages, array-based tensors can have significant redundancy. To optimize QCS algorithms for future hardware accelerators, this dissertation presents Fast Tensor Decision Diagram (FTDD), a novel open-source software framework. FTDD leverages Tensor Decision Diagram (TDD) to eliminate overheads and achieve significant speedups. On average, FTDD delivers a 37x speedup over Google's TensorNetwork library on redundancy-rich circuits and 25x and 144x speedups over quantum multi-valued decision diagram and prior TDD implementation, respectively, on Google random quantum circuits. FTDD introduces a linear-complexity rank simplification algorithm, Tetris, and edge-centric data structures for recursive TDD operations. Additionally, FTDD explores TN contraction ordering and optimizations from binary decision diagram.

Subject Added Entry-Topical Term: Quantum physics.

Subject Added Entry-Topical Term: Electrical engineering.

Subject Added Entry-Topical Term: Robotics.

Index Term-Uncontrolled: Domain-specific architecture

Index Term-Uncontrolled: H.264/AVC

Index Term-Uncontrolled: Autonomous navigation

Index Term-Uncontrolled: Neural network

Index Term-Uncontrolled: Quantum circuit simulation

Index Term-Uncontrolled: Decision diagram

Added Entry-Corporate Name: University of Michigan Electrical and Computer Engineering

Host Item Entry: Dissertations Abstracts International. 86-03B.

Electronic Location and Access: 로그인을 한후 보실 수 있는 자료입니다.

Control Number: joongbu:657242

New Books MORE

최근 3년간 통계입니다.

הזמנה
캠퍼스간 도서대출
서가에 없는 책 신고
보존서고대출신청
התיקיה שלי

גשמי
Reg No.	Call No.	מיקום	מצב	להשאיל מידע
TQ0033463	T	원문자료	열람가능/출력가능	열람가능/출력가능 마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

본문

서브메뉴

검색

New Books MORE

최근 3년간 통계입니다.

פרט מידע

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치

QUICK LINK