Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara |
Pubbl/distr/stampa | Cham, Switzerland : , : Springer, , [2022] |
Descrizione fisica | 1 online resource (157 pages) |
Disciplina | 005.13 |
Collana | Lecture Notes in Computer Science |
Soggetto topico |
High performance computing
Microprogramming Computer programming Càlcul intensiu (Informàtica) Programació (Ordinadors) |
Soggetto genere / forma |
Congressos
Llibres electrònics |
ISBN | 3-030-97759-5 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis. 4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index. |
Record Nr. | UNISA-996475771803316 |
Cham, Switzerland : , : Springer, , [2022] | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara |
Pubbl/distr/stampa | Cham, Switzerland : , : Springer, , [2022] |
Descrizione fisica | 1 online resource (157 pages) |
Disciplina | 005.13 |
Collana | Lecture Notes in Computer Science |
Soggetto topico |
High performance computing
Microprogramming Computer programming Càlcul intensiu (Informàtica) Programació (Ordinadors) |
Soggetto genere / forma |
Congressos
Llibres electrònics |
ISBN | 3-030-97759-5 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis. 4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index. |
Record Nr. | UNINA-9910568267303321 |
Cham, Switzerland : , : Springer, , [2022] | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke |
Edizione | [1st ed. 2019.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019 |
Descrizione fisica | 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.) |
Disciplina | 001.642 |
Collana | Programming and Software Engineering |
Soggetto topico |
Programming languages (Electronic computers)
Logic design Input-output equipment (Computers) Microprogramming Computer organization Programming Languages, Compilers, Interpreters Logic Design Input/Output and Data Communications Control Structures and Microprogramming Computer Systems Organization and Communication Networks |
ISBN | 3-030-12274-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNINA-9910337577103321 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke |
Edizione | [1st ed. 2019.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019 |
Descrizione fisica | 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.) |
Disciplina | 001.642 |
Collana | Programming and Software Engineering |
Soggetto topico |
Programming languages (Electronic computers)
Logic design Input-output equipment (Computers) Microprogramming Computer organization Programming Languages, Compilers, Interpreters Logic Design Input/Output and Data Communications Control Structures and Microprogramming Computer Systems Organization and Communication Networks |
ISBN | 3-030-12274-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNISA-996466464703316 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland |
Edizione | [1st ed. 2018.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018 |
Descrizione fisica | 1 online resource (IX, 183 p. 59 illus.) |
Disciplina | 004.3 |
Collana | Programming and Software Engineering |
Soggetto topico |
Programming languages (Electronic computers)
Logic design Operating systems (Computers) Computer programming Computer organization Computers Programming Languages, Compilers, Interpreters Logic Design Operating Systems Programming Techniques Computer Systems Organization and Communication Networks Models and Principles |
ISBN | 3-319-74896-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction. 7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index. |
Record Nr. | UNISA-996465474703316 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland |
Edizione | [1st ed. 2018.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018 |
Descrizione fisica | 1 online resource (IX, 183 p. 59 illus.) |
Disciplina | 004.3 |
Collana | Programming and Software Engineering |
Soggetto topico |
Programming languages (Electronic computers)
Logic design Operating systems (Computers) Computer programming Computer organization Computers Programming Languages, Compilers, Interpreters Logic Design Operating Systems Programming Techniques Computer Systems Organization and Communication Networks Models and Principles |
ISBN | 3-319-74896-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction. 7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index. |
Record Nr. | UNINA-9910349261003321 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors) |
Pubbl/distr/stampa | Cham, Switzerland : , : Springer, , [2022] |
Descrizione fisica | 1 online resource (159 pages) |
Disciplina | 004.35 |
Collana | Lecture notes in computer science |
Soggetto topico |
Parallel processing (Electronic computers)
Parallel programming (Computer science) Compilers (Computer programs) |
ISBN | 3-030-99372-8 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Compiler -- Locality-Based Optimizations in the Chapel Compiler -- 1 Introduction -- 2 Chapel Background -- 2.1 Distributed Arrays -- 2.2 Forall Loops -- 3 Compiler Analysis and Optimizations -- 3.1 Automatic Local Access -- 3.2 Automatic Aggregation -- 4 Results -- 5 Future Work -- 6 Related Work -- 7 Conclusion -- References -- iCetus: A Semi-automatic Parallel Programming Assistant -- 1 Introduction -- 2 Rationale for the iCetus Interactive Parallelizer and Tool Features -- 2.1 Automatic Parallelization in Cetus -- 2.2 The Opportunity of Interactive Parallelization -- 2.3 iCetus Features -- 2.4 Limitations of the Current Version of iCetus -- 3 iCetus System Overview -- 4 Evaluation -- 4.1 Importance and Usefulness of Existing iCetus Features -- 4.2 Importance and Usefulness of our Proposed iCetus Features -- 4.3 Requested Features for iCetus -- 5 Related Work -- 6 Conclusion -- References -- Hybrid Register Allocation with Spill Cost and Pattern Guided Optimization -- 1 Introduction -- 2 Background and Challenges -- 3 Preliminary Analysis -- 4 Design and Implementation -- 4.1 Code Pattern Recognizer -- 4.2 Spill Cost Tracking Mechanism -- 4.3 Putting it all Together: Cost-Guided Allocation Optimizer -- 5 Methodology -- 6 Evaluation Result -- 6.1 Benchmark Performance -- 6.2 Sensitivity Study -- 6.3 Compilation Overhead -- 7 Related Work -- 8 Conclusion and Future Work -- References -- Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores -- 1 Introduction -- 2 The OSCAR Automatic Parallelizing Compiler -- 3 Investigated Multicore Architectures -- 4 Benchmark Programs -- 5 Compile Flow -- 6 Performance of OSCAR Compiler-Parallelized Programs -- 6.1 OSCAR Compiled Benchmark Performance on Intel x86.
6.2 OSCAR Compiled Benchmark Performance on AMD X86 -- 6.3 OSCAR Compiled Benchmark Performance on Arm -- 6.4 OSCAR Compiled Benchmark Performance on RISC-V -- 7 Conclusion -- References -- Accelerators -- LC-MEMENTO: A Memory Model for Accelerated Architectures -- 1 Introduction -- 2 Background -- 2.1 Memory Consistency Models -- 2.2 The Abstract Runtime System: ARTS -- 2.3 NVIDIA CUDA Programming and Execution Environment -- 3 LC-MEMENTO Design and Implementation -- 3.1 Asynchronous Runtime Scheduler for Accelerators -- 3.2 Memory Models for Accelerators -- 4 Evaluation -- 4.1 STREAM Benchmark -- 4.2 Random Access Benchmark -- 4.3 Breadth-First Search -- 5 Related Work -- 6 Conclusions and Future Work -- References -- The ORKA-HPC Compiler-Practical OpenMP for FPGAs -- 1 Motivation -- 2 Related Work -- 3 The ORKA-HPC OpenMP-to-FPGA Compiler -- 3.1 OpenMP Lowering -- 3.2 FPGA Path -- 3.3 ORKA-HPC LLP-Backend -- 3.4 Host Path -- 4 Deployment -- 5 Evaluation -- 6 Contributions and Future Work -- References -- Graphs and Kernels -- Optimizing Sparse Matrix Multiplications for Graph Neural Networks -- 1 Introduction -- 2 Background -- 2.1 Graph Neural Networks -- 2.2 Sparse Matrix Storage Formats -- 3 Motivation -- 3.1 Setup -- 3.2 Results -- 4 Our Approach -- 4.1 Predictive Modeling -- 4.2 Problem Modeling -- 4.3 Training Data Generation -- 4.4 Feature Engineering -- 4.5 Training the Model -- 4.6 Using the Model -- 5 Experimental Setup -- 5.1 Software and Hardware -- 5.2 Evaluation Methodology -- 6 Experimental Results -- 6.1 Overall Results -- 6.2 Compare to Prior Methods -- 6.3 Compare to Oracle Performance -- 6.4 Model Analysis -- 6.5 Discussion -- 7 Related Work -- 8 Conclusions -- References -- A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve -- 1 Introduction -- 2 Motivation and Related Work -- 3 Preliminaries. 3.1 Sparse Matrix and Serial SpTS -- 3.2 Parallel SpTS -- 4 Our Approach -- 4.1 Overview -- 4.2 no-busy-wait -- 4.3 busy-wait -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 SpTS Performance Comparison -- 6 Conclusion and Future Work -- References -- Techniques for Managing Polyhedral Dataflow Graphs -- 1 Introduction -- 2 Background -- 2.1 GeoAc -- 2.2 SPF and the Computation API -- 2.3 Polyhedral Dataflow Graphs -- 3 Case Study: Expressing GeoAc and Examining Polyhedral Dataflow Graphs -- 3.1 Approximate Static Single Assignment -- 3.2 Producer Consumer Reductions -- 3.3 Graph Components -- 3.4 Data Dependent Control Flow -- 3.5 Dead Code Elimination -- 3.6 Subgraphs -- 3.7 Constant Size Arrays -- 3.8 Debugging Information -- 4 Related Work -- 5 Conclusion -- References -- Author Index. |
Record Nr. | UNISA-996464547403316 |
Cham, Switzerland : , : Springer, , [2022] | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors) |
Pubbl/distr/stampa | Cham, Switzerland : , : Springer, , [2022] |
Descrizione fisica | 1 online resource (159 pages) |
Disciplina | 004.35 |
Collana | Lecture notes in computer science |
Soggetto topico |
Parallel processing (Electronic computers)
Parallel programming (Computer science) Compilers (Computer programs) |
ISBN | 3-030-99372-8 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- Compiler -- Locality-Based Optimizations in the Chapel Compiler -- 1 Introduction -- 2 Chapel Background -- 2.1 Distributed Arrays -- 2.2 Forall Loops -- 3 Compiler Analysis and Optimizations -- 3.1 Automatic Local Access -- 3.2 Automatic Aggregation -- 4 Results -- 5 Future Work -- 6 Related Work -- 7 Conclusion -- References -- iCetus: A Semi-automatic Parallel Programming Assistant -- 1 Introduction -- 2 Rationale for the iCetus Interactive Parallelizer and Tool Features -- 2.1 Automatic Parallelization in Cetus -- 2.2 The Opportunity of Interactive Parallelization -- 2.3 iCetus Features -- 2.4 Limitations of the Current Version of iCetus -- 3 iCetus System Overview -- 4 Evaluation -- 4.1 Importance and Usefulness of Existing iCetus Features -- 4.2 Importance and Usefulness of our Proposed iCetus Features -- 4.3 Requested Features for iCetus -- 5 Related Work -- 6 Conclusion -- References -- Hybrid Register Allocation with Spill Cost and Pattern Guided Optimization -- 1 Introduction -- 2 Background and Challenges -- 3 Preliminary Analysis -- 4 Design and Implementation -- 4.1 Code Pattern Recognizer -- 4.2 Spill Cost Tracking Mechanism -- 4.3 Putting it all Together: Cost-Guided Allocation Optimizer -- 5 Methodology -- 6 Evaluation Result -- 6.1 Benchmark Performance -- 6.2 Sensitivity Study -- 6.3 Compilation Overhead -- 7 Related Work -- 8 Conclusion and Future Work -- References -- Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores -- 1 Introduction -- 2 The OSCAR Automatic Parallelizing Compiler -- 3 Investigated Multicore Architectures -- 4 Benchmark Programs -- 5 Compile Flow -- 6 Performance of OSCAR Compiler-Parallelized Programs -- 6.1 OSCAR Compiled Benchmark Performance on Intel x86.
6.2 OSCAR Compiled Benchmark Performance on AMD X86 -- 6.3 OSCAR Compiled Benchmark Performance on Arm -- 6.4 OSCAR Compiled Benchmark Performance on RISC-V -- 7 Conclusion -- References -- Accelerators -- LC-MEMENTO: A Memory Model for Accelerated Architectures -- 1 Introduction -- 2 Background -- 2.1 Memory Consistency Models -- 2.2 The Abstract Runtime System: ARTS -- 2.3 NVIDIA CUDA Programming and Execution Environment -- 3 LC-MEMENTO Design and Implementation -- 3.1 Asynchronous Runtime Scheduler for Accelerators -- 3.2 Memory Models for Accelerators -- 4 Evaluation -- 4.1 STREAM Benchmark -- 4.2 Random Access Benchmark -- 4.3 Breadth-First Search -- 5 Related Work -- 6 Conclusions and Future Work -- References -- The ORKA-HPC Compiler-Practical OpenMP for FPGAs -- 1 Motivation -- 2 Related Work -- 3 The ORKA-HPC OpenMP-to-FPGA Compiler -- 3.1 OpenMP Lowering -- 3.2 FPGA Path -- 3.3 ORKA-HPC LLP-Backend -- 3.4 Host Path -- 4 Deployment -- 5 Evaluation -- 6 Contributions and Future Work -- References -- Graphs and Kernels -- Optimizing Sparse Matrix Multiplications for Graph Neural Networks -- 1 Introduction -- 2 Background -- 2.1 Graph Neural Networks -- 2.2 Sparse Matrix Storage Formats -- 3 Motivation -- 3.1 Setup -- 3.2 Results -- 4 Our Approach -- 4.1 Predictive Modeling -- 4.2 Problem Modeling -- 4.3 Training Data Generation -- 4.4 Feature Engineering -- 4.5 Training the Model -- 4.6 Using the Model -- 5 Experimental Setup -- 5.1 Software and Hardware -- 5.2 Evaluation Methodology -- 6 Experimental Results -- 6.1 Overall Results -- 6.2 Compare to Prior Methods -- 6.3 Compare to Oracle Performance -- 6.4 Model Analysis -- 6.5 Discussion -- 7 Related Work -- 8 Conclusions -- References -- A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve -- 1 Introduction -- 2 Motivation and Related Work -- 3 Preliminaries. 3.1 Sparse Matrix and Serial SpTS -- 3.2 Parallel SpTS -- 4 Our Approach -- 4.1 Overview -- 4.2 no-busy-wait -- 4.3 busy-wait -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 SpTS Performance Comparison -- 6 Conclusion and Future Work -- References -- Techniques for Managing Polyhedral Dataflow Graphs -- 1 Introduction -- 2 Background -- 2.1 GeoAc -- 2.2 SPF and the Computation API -- 2.3 Polyhedral Dataflow Graphs -- 3 Case Study: Expressing GeoAc and Examining Polyhedral Dataflow Graphs -- 3.1 Approximate Static Single Assignment -- 3.2 Producer Consumer Reductions -- 3.3 Graph Components -- 3.4 Data Dependent Control Flow -- 3.5 Dead Code Elimination -- 3.6 Subgraphs -- 3.7 Constant Size Arrays -- 3.8 Debugging Information -- 4 Related Work -- 5 Conclusion -- References -- Author Index. |
Record Nr. | UNINA-9910556898603321 |
Cham, Switzerland : , : Springer, , [2022] | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran |
Edizione | [1st ed. 2020.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020 |
Descrizione fisica | 1 online resource (X, 205 p. 289 illus., 70 illus. in color.) |
Disciplina | 004.11 |
Collana | Communications in Computer and Information Science |
Soggetto topico |
Special purpose computers
Software engineering Computer input-output equipment Special Purpose and Application-Based Systems Software Engineering/Programming and Operating Systems Computer Hardware |
ISBN | 3-030-44728-6 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | HUST -- Annual Workshop on HPC User Support Tools. -SE-HER -- International Workshop on Software Engineering for HPC-Enabled Research. - WIHPC – Workshop on Interactive High-Performance Computing. |
Record Nr. | UNINA-9910410050403321 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran |
Edizione | [1st ed. 2020.] |
Pubbl/distr/stampa | Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020 |
Descrizione fisica | 1 online resource (X, 205 p. 289 illus., 70 illus. in color.) |
Disciplina | 004.11 |
Collana | Communications in Computer and Information Science |
Soggetto topico |
Special purpose computers
Software engineering Computer input-output equipment Special Purpose and Application-Based Systems Software Engineering/Programming and Operating Systems Computer Hardware |
ISBN | 3-030-44728-6 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | HUST -- Annual Workshop on HPC User Support Tools. -SE-HER -- International Workshop on Software Engineering for HPC-Enabled Research. - WIHPC – Workshop on Interactive High-Performance Computing. |
Record Nr. | UNISA-996465461003316 |
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|