서브메뉴
검색
Domain-Specific Acceleration: From Efficient Vision Processing Hardware to High-Performance Quantum Computing Software.
Domain-Specific Acceleration: From Efficient Vision Processing Hardware to High-Performance Quantum Computing Software.
- 자료유형
- 학위논문
- Control Number
- 0017164361
- International Standard Book Number
- 9798384042075
- Dewey Decimal Classification Number
- 530.1
- Main Entry-Personal Name
- Zhang, Qirui.
- Publication, Distribution, etc. (Imprint
- [S.l.] : University of Michigan., 2024
- Publication, Distribution, etc. (Imprint
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- Physical Description
- 112 p.
- General Note
- Source: Dissertations Abstracts International, Volume: 86-03, Section: B.
- General Note
- Advisor: Sylvester, Dennis.
- Dissertation Note
- Thesis (Ph.D.)--University of Michigan, 2024.
- Summary, Etc.
- 요약With the end of Dennard scaling and the decline of Moore's law, there are no longer 'free' performance and efficiency gains from semiconductor technology advancements. Domain-Specific Acceleration (DSA) is a promising remaining path for further significant improvements. This approach involves designing optimized software and hardware tailored to specific application domains. Successful DSA requires careful consideration of methodologies such as specialization, parallelism exploitation, algorithm-hardware co-design, and balancing efficiency with programmability. To extend the boundaries of DSA, especially for less extensively studied domains, this dissertation studies DSA designs for three application areas: Image compression, robotic vision, and Quantum Circuit Simulation (QCS). Though these domains differ, the three designs employ a common methodology of algorithm-hardware co-optimizations to reduce data movements from memory to the processing units.Firstly, this dissertation presents an Ultra-Low-Power (ULP) H.264 or Advanced Video Coding (AVC) intra-frame image compression accelerator for event-driven Internet of Things (IoT) imaging systems. The H.264/AVC intra-frame codec is customized to compress arbitrary non-rectangular change-detected regions. Novel algorithm-hardware co-designs optimize energy and latency from image memory accesses, reducing overhead for neighbor macroblock accesses by 2.6x with negligible quality loss. Split control for major processing phases exploits data dependency and pipelining, while data path micro-architecture reconfiguration reduces area and leakage. Fabricated in 40nm, the accelerator occupies 0.32mm2 with 4kB SRAM, consuming only 1.21μW at 0.6V and 153kHz, achieving 30.9pJ/pixel compression energy efficiency. Combined with change detection, this design brings a 133x reduction in overall energy for egressing images of change-detected regions in an event-driven IoT imaging system.Secondly, this dissertation introduces RoboVisio, an efficient and flexible domain-specific System-on-Chip (SoC) for vision tasks in autonomous micro-robot navigation. A novel hybrid Processing Element (PE) is proposed, combining a 2D-mapping architecture for classic vision tasks with an output-channel-parallel systolic architecture for Convolutional Neural Network (CNN). This integration future-proofs the architecture, facilitating next-generation CNN-heavy vision algorithms, saving 40% in area and leakage without power or throughput loss compared to separate implementations. Other key features include 2MB magnetoresistive random-access memory for non-volatile fully-on-chip weight storage, a unified image-activation memory with block-swapping-based buffering that reduces buffer footprint by 50% and eliminates data copy for multi-frame buffering, and a combination of weight buffering and CNN loop ordering reducing weight memory system power by 75%. Fabricated in 22nm, RoboVisio achieves 0.22nJ/pix for Harris corner detection and 3.5TOPS/W (16-bit OP) for CNN, a 40% to 170% efficiency improvement over state-of-the-art edge machine learning SoCs using non-volatile memory.Lastly, this dissertation examines the acceleration of QCS, a crucial computational problem for quantum computing development. Predominant approaches center on Tensor Network (TN), valued for better concurrency and reduced computation compared to full quantum vectors and matrices. However, even with the advantages, array-based tensors can have significant redundancy. To optimize QCS algorithms for future hardware accelerators, this dissertation presents Fast Tensor Decision Diagram (FTDD), a novel open-source software framework. FTDD leverages Tensor Decision Diagram (TDD) to eliminate overheads and achieve significant speedups. On average, FTDD delivers a 37x speedup over Google's TensorNetwork library on redundancy-rich circuits and 25x and 144x speedups over quantum multi-valued decision diagram and prior TDD implementation, respectively, on Google random quantum circuits. FTDD introduces a linear-complexity rank simplification algorithm, Tetris, and edge-centric data structures for recursive TDD operations. Additionally, FTDD explores TN contraction ordering and optimizations from binary decision diagram.
- Subject Added Entry-Topical Term
- Quantum physics.
- Subject Added Entry-Topical Term
- Electrical engineering.
- Subject Added Entry-Topical Term
- Robotics.
- Index Term-Uncontrolled
- Domain-specific architecture
- Index Term-Uncontrolled
- H.264/AVC
- Index Term-Uncontrolled
- Autonomous navigation
- Index Term-Uncontrolled
- Neural network
- Index Term-Uncontrolled
- Quantum circuit simulation
- Index Term-Uncontrolled
- Decision diagram
- Added Entry-Corporate Name
- University of Michigan Electrical and Computer Engineering
- Host Item Entry
- Dissertations Abstracts International. 86-03B.
- Electronic Location and Access
- 로그인을 한후 보실 수 있는 자료입니다.
- Control Number
- joongbu:657242