본문

서브메뉴

Domain-Specific Acceleration: From Efficient Vision Processing Hardware to High-Performance Quantum Computing Software.
Contents Info
Domain-Specific Acceleration: From Efficient Vision Processing Hardware to High-Performance Quantum Computing Software.
자료유형  
 학위논문
Control Number  
0017164361
International Standard Book Number  
9798384042075
Dewey Decimal Classification Number  
530.1
Main Entry-Personal Name  
Zhang, Qirui.
Publication, Distribution, etc. (Imprint  
[S.l.] : University of Michigan., 2024
Publication, Distribution, etc. (Imprint  
Ann Arbor : ProQuest Dissertations & Theses, 2024
Physical Description  
112 p.
General Note  
Source: Dissertations Abstracts International, Volume: 86-03, Section: B.
General Note  
Advisor: Sylvester, Dennis.
Dissertation Note  
Thesis (Ph.D.)--University of Michigan, 2024.
Summary, Etc.  
요약With the end of Dennard scaling and the decline of Moore's law, there are no longer 'free' performance and efficiency gains from semiconductor technology advancements. Domain-Specific Acceleration (DSA) is a promising remaining path for further significant improvements. This approach involves designing optimized software and hardware tailored to specific application domains. Successful DSA requires careful consideration of methodologies such as specialization, parallelism exploitation, algorithm-hardware co-design, and balancing efficiency with programmability. To extend the boundaries of DSA, especially for less extensively studied domains, this dissertation studies DSA designs for three application areas: Image compression, robotic vision, and Quantum Circuit Simulation (QCS). Though these domains differ, the three designs employ a common methodology of algorithm-hardware co-optimizations to reduce data movements from memory to the processing units.Firstly, this dissertation presents an Ultra-Low-Power (ULP) H.264 or Advanced Video Coding (AVC) intra-frame image compression accelerator for event-driven Internet of Things (IoT) imaging systems. The H.264/AVC intra-frame codec is customized to compress arbitrary non-rectangular change-detected regions. Novel algorithm-hardware co-designs optimize energy and latency from image memory accesses, reducing overhead for neighbor macroblock accesses by 2.6x with negligible quality loss. Split control for major processing phases exploits data dependency and pipelining, while data path micro-architecture reconfiguration reduces area and leakage. Fabricated in 40nm, the accelerator occupies 0.32mm2 with 4kB SRAM, consuming only 1.21μW at 0.6V and 153kHz, achieving 30.9pJ/pixel compression energy efficiency. Combined with change detection, this design brings a 133x reduction in overall energy for egressing images of change-detected regions in an event-driven IoT imaging system.Secondly, this dissertation introduces RoboVisio, an efficient and flexible domain-specific System-on-Chip (SoC) for vision tasks in autonomous micro-robot navigation. A novel hybrid Processing Element (PE) is proposed, combining a 2D-mapping architecture for classic vision tasks with an output-channel-parallel systolic architecture for Convolutional Neural Network (CNN). This integration future-proofs the architecture, facilitating next-generation CNN-heavy vision algorithms, saving 40% in area and leakage without power or throughput loss compared to separate implementations. Other key features include 2MB magnetoresistive random-access memory for non-volatile fully-on-chip weight storage, a unified image-activation memory with block-swapping-based buffering that reduces buffer footprint by 50% and eliminates data copy for multi-frame buffering, and a combination of weight buffering and CNN loop ordering reducing weight memory system power by 75%. Fabricated in 22nm, RoboVisio achieves 0.22nJ/pix for Harris corner detection and 3.5TOPS/W (16-bit OP) for CNN, a 40% to 170% efficiency improvement over state-of-the-art edge machine learning SoCs using non-volatile memory.Lastly, this dissertation examines the acceleration of QCS, a crucial computational problem for quantum computing development. Predominant approaches center on Tensor Network (TN), valued for better concurrency and reduced computation compared to full quantum vectors and matrices. However, even with the advantages, array-based tensors can have significant redundancy. To optimize QCS algorithms for future hardware accelerators, this dissertation presents Fast Tensor Decision Diagram (FTDD), a novel open-source software framework. FTDD leverages Tensor Decision Diagram (TDD) to eliminate overheads and achieve significant speedups. On average, FTDD delivers a 37x speedup over Google's TensorNetwork library on redundancy-rich circuits and 25x and 144x speedups over quantum multi-valued decision diagram and prior TDD implementation, respectively, on Google random quantum circuits. FTDD introduces a linear-complexity rank simplification algorithm, Tetris, and edge-centric data structures for recursive TDD operations. Additionally, FTDD explores TN contraction ordering and optimizations from binary decision diagram.
Subject Added Entry-Topical Term  
Quantum physics.
Subject Added Entry-Topical Term  
Electrical engineering.
Subject Added Entry-Topical Term  
Robotics.
Index Term-Uncontrolled  
Domain-specific architecture
Index Term-Uncontrolled  
H.264/AVC
Index Term-Uncontrolled  
Autonomous navigation
Index Term-Uncontrolled  
Neural network
Index Term-Uncontrolled  
Quantum circuit simulation
Index Term-Uncontrolled  
Decision diagram
Added Entry-Corporate Name  
University of Michigan Electrical and Computer Engineering
Host Item Entry  
Dissertations Abstracts International. 86-03B.
Electronic Location and Access  
로그인을 한후 보실 수 있는 자료입니다.
Control Number  
joongbu:657242
New Books MORE
최근 3년간 통계입니다.

פרט מידע

  • הזמנה
  • 캠퍼스간 도서대출
  • 서가에 없는 책 신고
  • התיקיה שלי
גשמי
Reg No. Call No. מיקום מצב להשאיל מידע
TQ0033463 T   원문자료 열람가능/출력가능 열람가능/출력가능
마이폴더 부재도서신고

* הזמנות זמינים בספר ההשאלה. כדי להזמין, נא לחץ על כפתור ההזמנה

해당 도서를 다른 이용자가 함께 대출한 도서

Related books

Related Popular Books

도서위치