de
en
Schliessen
Detailsuche
Bibliotheken
Projekt
Impressum
Datenschutz
Schliessen
Publizieren
Besondere Sammlungen
Digitalisierungsservice
Hilfe
Impressum
Datenschutz
zum Inhalt
Detailsuche
Schnellsuche:
OK
Ergebnisliste
Titel
Titel
Inhalt
Inhalt
Seite
Seite
Im Werk suchen
Schumacher, Tobias: Performance modeling and analysis in high-performance reconfigurable computing. 2011
Inhalt
Abstract
Zusammenfassung
Contents
1 Introduction
1.1 Motivation
1.2 Contributions of this Thesis
1.3 Thesis Outline
2 Background and Related Work
2.1 Accelerated Supercomputing
2.2 Performance Modeling, Analysis and Optimization
2.2.1 Performance Analysis in High-Performance Computing
2.2.2 Analytical Performance Estimation for Reconfigurable HPC
2.2.3 Bottleneck Identification and Optimization of Reconfigurable Accelerators
2.2.4 Reconfigurable Hardware Characterization
2.3 Design Implementation, Verification and Optimization
2.3.1 High-Level Language (HLL) Synthesis
2.3.2 Visual Design Entry
2.3.3 Multi-Core System Generation
2.4 Chapter Summary
3 Programming, Execution and Performance Model
3.1 Introduction
3.2 The IMORC Programming Model
3.2.1 The Architecture Model
3.2.2 The Execution Model
3.3 Development Flow
3.3.1 Partitioning and Initial Mapping
3.3.2 Task Graph Refinement
3.3.3 Architecture Generation
3.4 Chapter Summary
4 The IMORC Architectural Template
4.1 Cores, Links, and Channels
4.2 Network Topology and Arbitration
4.3 Performance Counters
4.4 Utility Cores
4.4.1 Host Interface Cores
4.4.2 Memory Cores
4.4.3 Request Generator Cores
4.4.4 IMORC-to-Register Interface Core
4.4.5 Register-to-IMORC Interface Core
4.4.6 Farming Cores
4.5 IMORC on the XtremeData XD1000
4.5.1 The FPGA
4.5.2 Host Interface
4.5.3 External DDR Memory Access
4.6 IMORC Infrastructure Cores and Accelerator Generation
4.6.1 Core Generation
4.6.2 Communication Infrastructure Generation
4.6.3 Simulation
4.6.4 Synthesis
4.6.5 Execution and Runtime Monitoring
4.7 Chapter Summary
5 Architecture Characterization
5.1 The IMORC Benchmarking Infrastructure
5.1.1 The Benchmarking Core
5.1.2 Contention Benchmarking
5.2 Performance Characterization of the XD1000
5.2.1 CPU <-> Host Memory Bandwidth
5.2.2 CPU <-> FPGA Communication Initiated by the CPU
5.2.3 Burst Read/Write Transfers Initiated by the FPGA
5.2.4 Simultaneous Access by Multiple Cores with a Common Access Scheme (Read or Write)
5.2.5 Contention Benchmark with Multiple Simultaneous Reads and Writes
5.3 Chapter Summary
6 Experimental Evaluation
6.1 Cube Cut
6.1.1 The Cube Cut Algorithm
6.1.2 Design and Implementation
6.1.3 Architecture Mapping, Implementation and Performance Evaluation
6.2 A Compositing Accelerator for a Parallel Rendering Framework
6.2.1 Application Model
6.2.2 Implementation
6.2.3 Performance Evaluation
6.3 K-th Nearest Neighbor Thinning
6.3.1 Application Model
6.3.2 IMORC KNN Cores
6.3.3 Architecture Generation
6.3.4 Numeric Evaluation
6.4 Chapter Summary
7 Conclusion and Outlook
7.1 Contributions
7.2 Conclusion
7.3 Future Directions
Acronyms
List of Figures
List of Tables
Author's Publications
Bibliography
Die detaillierte Suchanfrage erfordert aktiviertes Javascript.