LEADER 06055nam  2200481   450 
001 9910760277003321
005 20231016195526.0
010   $a3-031-38230-7
035   $a(MiAaPQ)EBC30746903
035   $a(Au-PeEL)EBL30746903
035   $a(EXLCZ)9928272158600041
100   $a20231016d2024      uy             0
101 0 $aeng
135   $aurcnu||||||||
181   $ctxt$2rdacontent
182   $cc$2rdamedia
183   $acr$2rdacarrier
200 10$aTowards Heterogeneous Multi-Core Systems-on-Chip for Edge Machine Learning $eJourney from Single-Core Acceleration to Multi-core Heterogeneous Systems /$fVikram Jain and Marian Verhelst
205   $aFirst edition.
210  1$aCham, Switzerland :$cSpringer,$d[2024]
210  4$d©2024
215   $a1 online resource (199 pages)
311 08$aPrint version: Jain, Vikram Towards Heterogeneous Multi-Core Systems-on-Chip for Edge Machine Learning Cham : Springer International Publishing AG,c2023 9783031382291 
320   $aIncludes bibliographical references and index.
327   $aIntro -- Preface -- Acknowledgments -- Contents -- List of Abbreviations -- List of Figures -- List of Tables -- 1 Introduction -- 1.1 Machine Learning at the (Extreme) Edge -- 1.1.1 Applications -- 1.1.2 Algorithms -- 1.1.3 Hardware -- 1.2 Open Challenges for ML Acceleration at the (Extreme) Edge -- 1.3 Book Contributions -- 2 Algorithmic Background for Machine Learning -- 2.1 Support Vector Machines -- 2.2 Deep Learning Models -- 2.2.1 Neural Networks -- 2.2.2 Training -- 2.2.3 Inference: Neural Network Topologies -- 2.2.4 Model Compression -- 2.3 Feature Extraction -- 2.4 Conclusion -- 3 Scoping the Landscape of (Extreme) Edge Machine Learning Processors -- 3.1 Hardware Acceleration of ML Workloads: A Primer -- 3.1.1 Core Mathematical Operation -- 3.1.2 General Accelerator Template -- 3.2 Evaluation Metrics -- 3.3 Survey of (Extreme) Edge ML Hardware Platforms -- 3.4 Evaluating the Surveyed Hardware Platforms -- 3.5 Insights and Trends -- 3.6 Conclusion -- 4 Hardware-Software Co-optimization Through Design Space Exploration -- 4.1 Motivation -- 4.2 Exploration Methodology -- 4.2.1 ZigZag -- 4.2.2 Post-Processing of ZigZag's Results -- 4.3 DNN Workload Comparison -- 4.3.1 Exploration Setup -- 4.3.2 Visualization of the Complete Trade-Off Space -- 4.3.3 Impact of HW Architecture on Optimal Workload -- 4.3.4 Impact of Workload on Optimal HW Architecture -- 4.4 Conclusion -- 5 Energy-Efficient Single-Core Hardware Acceleration -- 5.1 Motivation -- 5.2 Metrics for Hardware Optimization -- 5.3 State-of-the-Art in Object Detection on FPGA -- 5.4 Cost-Aware Algorithmic Optimization -- 5.4.1 Object Detection Algorithms -- 5.4.2 Quantization of Tiny-YOLOv2 -- Post-training Quantization -- Quantization-Aware Training -- 5.5 Cost-Aware Architecture Optimization -- 5.5.1 Hardware Mapping of Convolutional Layers.
327   $a5.5.2 Hardware Architecture of the Accelerator -- 5.6 Cost-Aware System Optimization -- 5.6.1 Data Communication Architecture -- 5.6.2 Tiling Strategy -- 5.7 Implementation Results -- 5.8 Conclusion -- 6 TinyVers: A Tiny Versatile All-Digital Heterogeneous Multi-core System-on-Chip -- 6.1 Motivation -- 6.2 Algorithmic Background -- 6.2.1 Convolution and Dense Operation -- 6.2.2 Deconvolution -- 6.2.3 Support Vector Machines (SVMs) -- 6.3 TinyVers Hardware Architecture -- 6.3.1 Smart Sensing Modes for TinyML -- 6.3.2 Power Management -- 6.4 FlexML Accelerator -- 6.4.1 FlexML Architecture Overview -- 6.4.2 Dataflow Reconfiguration -- 6.4.3 Efficient Zero-Skipping for Deconvolution and Blockwise Structured Sparsity -- 6.4.4 Support Vector Machine -- 6.5 Deployment of Neural Networks on TinyVers -- 6.6 Design for Test and Fault-Tolerance -- 6.7 Chip Implementation and Measurement -- 6.7.1 Peak Performance Analysis -- 6.7.2 Workload Benchmarks -- 6.7.3 Power Management -- 6.7.4 Instantaneous Power Trace -- Keyword Spotting Application -- Machine Monitoring Application -- 6.8 Comparison with SotA -- 6.9 Conclusion -- 7 DIANA: DIgital and ANAlog Heterogeneous Multi-core System-on-Chip -- 7.1 Motivation -- 7.2 Design Choices -- 7.2.1 Dataflow Concepts -- 7.2.2 Design Space Exploration -- 7.2.3 A Reconfigurable Heterogeneous Architecture -- 7.2.4 Optimization Strategies for Multi-core -- 7.3 System Architecture -- 7.3.1 The RISC-V CPU and Network Control -- 7.3.2 Memory System -- 7.4 AIMC Computing Core -- 7.4.1 AIMC Core Microarchitecture -- 7.4.2 Memory Control Unit (MCU) -- 7.4.3 AIMC Macro -- 7.4.4 Output Buffer and SIMD Unit -- 7.5 Digital DNN Accelerator -- 7.6 Measurements -- 7.6.1 Efficiency vs. Accuracy Trade-Off in the Analog Macro -- 7.6.2 Peak Performance and Efficiency Characterization -- 7.6.3 Workload Performance Characterization.
327   $a7.6.4 SotA Comparison -- 7.7 Conclusion -- 8 Networks-on-Chip to Enable Large-Scale Multi-core ML Acceleration -- 8.1 Motivation -- 8.2 Background -- 8.2.1 Network-on-Chips -- 8.2.2 AXI Protocol -- Burst -- Multiple Outstanding Transaction -- 8.3 Interconnect Architecture of PATRONoC -- 8.4 Implementation Results -- 8.5 Performance Evaluation -- 8.5.1 Uniform Random Traffic -- 8.5.2 Synthetic Traffic -- 8.5.3 DNN Workload Traffic -- 8.6 Related Work -- 8.7 Conclusion -- 9 Conclusion -- 9.1 Overview and Contributions -- 9.2 Suggestions for Future Work -- 9.2.1 The Low Hanging Fruits -- 9.2.2 Medium Term -- 9.2.3 Moonshot -- 9.3 Closing Remarks -- References -- References -- Index.
606   $aEdge computing
606   $aMachine learning
606   $aSystems on a chip$xDesign and construction
615  0$aEdge computing.
615  0$aMachine learning.
615  0$aSystems on a chip$xDesign and construction.
676   $a005.758
700   $aJain$b Vikram$01437891
702   $aVerhelst$b Marian
801  0$bMiAaPQ
801  1$bMiAaPQ
801  2$bMiAaPQ
906   $aBOOK
912   $a9910760277003321
996   $aTowards Heterogeneous Multi-Core Systems-on-Chip for Edge Machine Learning$93598717
997   $aUNINA