Vai al contenuto principale della pagina
| Titolo: |
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors)
|
| Pubblicazione: | Cham, Switzerland : , : Springer, , [2022] |
| ©2022 | |
| Descrizione fisica: | 1 online resource (159 pages) |
| Disciplina: | 004.35 |
| Soggetto topico: | Parallel processing (Electronic computers) |
| Parallel programming (Computer science) | |
| Compilers (Computer programs) | |
| Persona (resp. second.): | LiXiaoming <1977-> |
| ChandrasekaranSunita | |
| Note generali: | Includes index. |
| Nota di contenuto: | Intro -- Preface -- Organization -- Contents -- Compiler -- Locality-Based Optimizations in the Chapel Compiler -- 1 Introduction -- 2 Chapel Background -- 2.1 Distributed Arrays -- 2.2 Forall Loops -- 3 Compiler Analysis and Optimizations -- 3.1 Automatic Local Access -- 3.2 Automatic Aggregation -- 4 Results -- 5 Future Work -- 6 Related Work -- 7 Conclusion -- References -- iCetus: A Semi-automatic Parallel Programming Assistant -- 1 Introduction -- 2 Rationale for the iCetus Interactive Parallelizer and Tool Features -- 2.1 Automatic Parallelization in Cetus -- 2.2 The Opportunity of Interactive Parallelization -- 2.3 iCetus Features -- 2.4 Limitations of the Current Version of iCetus -- 3 iCetus System Overview -- 4 Evaluation -- 4.1 Importance and Usefulness of Existing iCetus Features -- 4.2 Importance and Usefulness of our Proposed iCetus Features -- 4.3 Requested Features for iCetus -- 5 Related Work -- 6 Conclusion -- References -- Hybrid Register Allocation with Spill Cost and Pattern Guided Optimization -- 1 Introduction -- 2 Background and Challenges -- 3 Preliminary Analysis -- 4 Design and Implementation -- 4.1 Code Pattern Recognizer -- 4.2 Spill Cost Tracking Mechanism -- 4.3 Putting it all Together: Cost-Guided Allocation Optimizer -- 5 Methodology -- 6 Evaluation Result -- 6.1 Benchmark Performance -- 6.2 Sensitivity Study -- 6.3 Compilation Overhead -- 7 Related Work -- 8 Conclusion and Future Work -- References -- Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores -- 1 Introduction -- 2 The OSCAR Automatic Parallelizing Compiler -- 3 Investigated Multicore Architectures -- 4 Benchmark Programs -- 5 Compile Flow -- 6 Performance of OSCAR Compiler-Parallelized Programs -- 6.1 OSCAR Compiled Benchmark Performance on Intel x86. |
| 6.2 OSCAR Compiled Benchmark Performance on AMD X86 -- 6.3 OSCAR Compiled Benchmark Performance on Arm -- 6.4 OSCAR Compiled Benchmark Performance on RISC-V -- 7 Conclusion -- References -- Accelerators -- LC-MEMENTO: A Memory Model for Accelerated Architectures -- 1 Introduction -- 2 Background -- 2.1 Memory Consistency Models -- 2.2 The Abstract Runtime System: ARTS -- 2.3 NVIDIA CUDA Programming and Execution Environment -- 3 LC-MEMENTO Design and Implementation -- 3.1 Asynchronous Runtime Scheduler for Accelerators -- 3.2 Memory Models for Accelerators -- 4 Evaluation -- 4.1 STREAM Benchmark -- 4.2 Random Access Benchmark -- 4.3 Breadth-First Search -- 5 Related Work -- 6 Conclusions and Future Work -- References -- The ORKA-HPC Compiler-Practical OpenMP for FPGAs -- 1 Motivation -- 2 Related Work -- 3 The ORKA-HPC OpenMP-to-FPGA Compiler -- 3.1 OpenMP Lowering -- 3.2 FPGA Path -- 3.3 ORKA-HPC LLP-Backend -- 3.4 Host Path -- 4 Deployment -- 5 Evaluation -- 6 Contributions and Future Work -- References -- Graphs and Kernels -- Optimizing Sparse Matrix Multiplications for Graph Neural Networks -- 1 Introduction -- 2 Background -- 2.1 Graph Neural Networks -- 2.2 Sparse Matrix Storage Formats -- 3 Motivation -- 3.1 Setup -- 3.2 Results -- 4 Our Approach -- 4.1 Predictive Modeling -- 4.2 Problem Modeling -- 4.3 Training Data Generation -- 4.4 Feature Engineering -- 4.5 Training the Model -- 4.6 Using the Model -- 5 Experimental Setup -- 5.1 Software and Hardware -- 5.2 Evaluation Methodology -- 6 Experimental Results -- 6.1 Overall Results -- 6.2 Compare to Prior Methods -- 6.3 Compare to Oracle Performance -- 6.4 Model Analysis -- 6.5 Discussion -- 7 Related Work -- 8 Conclusions -- References -- A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve -- 1 Introduction -- 2 Motivation and Related Work -- 3 Preliminaries. | |
| 3.1 Sparse Matrix and Serial SpTS -- 3.2 Parallel SpTS -- 4 Our Approach -- 4.1 Overview -- 4.2 no-busy-wait -- 4.3 busy-wait -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 SpTS Performance Comparison -- 6 Conclusion and Future Work -- References -- Techniques for Managing Polyhedral Dataflow Graphs -- 1 Introduction -- 2 Background -- 2.1 GeoAc -- 2.2 SPF and the Computation API -- 2.3 Polyhedral Dataflow Graphs -- 3 Case Study: Expressing GeoAc and Examining Polyhedral Dataflow Graphs -- 3.1 Approximate Static Single Assignment -- 3.2 Producer Consumer Reductions -- 3.3 Graph Components -- 3.4 Data Dependent Control Flow -- 3.5 Dead Code Elimination -- 3.6 Subgraphs -- 3.7 Constant Size Arrays -- 3.8 Debugging Information -- 4 Related Work -- 5 Conclusion -- References -- Author Index. | |
| Titolo autorizzato: | Languages and Compilers for Parallel Computing ![]() |
| ISBN: | 3-030-99372-8 |
| Formato: | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione: | Inglese |
| Record Nr.: | 996464547403316 |
| Lo trovi qui: | Univ. di Salerno |
| Opac: | Controlla la disponibilità qui |