top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (157 pages)
Disciplina 005.13
Collana Lecture Notes in Computer Science
Soggetto topico High performance computing
Microprogramming
Computer programming
Càlcul intensiu (Informàtica)
Programació (Ordinadors)
Soggetto genere / forma Congressos
Llibres electrònics
ISBN 3-030-97759-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis.
4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index.
Record Nr. UNISA-996475771803316
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (157 pages)
Disciplina 005.13
Collana Lecture Notes in Computer Science
Soggetto topico High performance computing
Microprogramming
Computer programming
Càlcul intensiu (Informàtica)
Programació (Ordinadors)
Soggetto genere / forma Congressos
Llibres electrònics
ISBN 3-030-97759-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis.
4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index.
Record Nr. UNINA-9910568267303321
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.)
Disciplina 001.642
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Input-output equipment (Computers)
Microprogramming 
Computer organization
Programming Languages, Compilers, Interpreters
Logic Design
Input/Output and Data Communications
Control Structures and Microprogramming
Computer Systems Organization and Communication Networks
ISBN 3-030-12274-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNINA-9910337577103321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.)
Disciplina 001.642
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Input-output equipment (Computers)
Microprogramming 
Computer organization
Programming Languages, Compilers, Interpreters
Logic Design
Input/Output and Data Communications
Control Structures and Microprogramming
Computer Systems Organization and Communication Networks
ISBN 3-030-12274-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996466464703316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Edizione [1st ed. 2018.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Descrizione fisica 1 online resource (IX, 183 p. 59 illus.)
Disciplina 004.3
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Operating systems (Computers)
Computer programming
Computer organization
Computers
Programming Languages, Compilers, Interpreters
Logic Design
Operating Systems
Programming Techniques
Computer Systems Organization and Communication Networks
Models and Principles
ISBN 3-319-74896-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction.
7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index.
Record Nr. UNISA-996465474703316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Edizione [1st ed. 2018.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Descrizione fisica 1 online resource (IX, 183 p. 59 illus.)
Disciplina 004.3
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Operating systems (Computers)
Computer programming
Computer organization
Computers
Programming Languages, Compilers, Interpreters
Logic Design
Operating Systems
Programming Techniques
Computer Systems Organization and Communication Networks
Models and Principles
ISBN 3-319-74896-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction.
7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index.
Record Nr. UNINA-9910349261003321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors)
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors)
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (159 pages)
Disciplina 004.35
Collana Lecture notes in computer science
Soggetto topico Parallel processing (Electronic computers)
Parallel programming (Computer science)
Compilers (Computer programs)
ISBN 3-030-99372-8
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Compiler -- Locality-Based Optimizations in the Chapel Compiler -- 1 Introduction -- 2 Chapel Background -- 2.1 Distributed Arrays -- 2.2 Forall Loops -- 3 Compiler Analysis and Optimizations -- 3.1 Automatic Local Access -- 3.2 Automatic Aggregation -- 4 Results -- 5 Future Work -- 6 Related Work -- 7 Conclusion -- References -- iCetus: A Semi-automatic Parallel Programming Assistant -- 1 Introduction -- 2 Rationale for the iCetus Interactive Parallelizer and Tool Features -- 2.1 Automatic Parallelization in Cetus -- 2.2 The Opportunity of Interactive Parallelization -- 2.3 iCetus Features -- 2.4 Limitations of the Current Version of iCetus -- 3 iCetus System Overview -- 4 Evaluation -- 4.1 Importance and Usefulness of Existing iCetus Features -- 4.2 Importance and Usefulness of our Proposed iCetus Features -- 4.3 Requested Features for iCetus -- 5 Related Work -- 6 Conclusion -- References -- Hybrid Register Allocation with Spill Cost and Pattern Guided Optimization -- 1 Introduction -- 2 Background and Challenges -- 3 Preliminary Analysis -- 4 Design and Implementation -- 4.1 Code Pattern Recognizer -- 4.2 Spill Cost Tracking Mechanism -- 4.3 Putting it all Together: Cost-Guided Allocation Optimizer -- 5 Methodology -- 6 Evaluation Result -- 6.1 Benchmark Performance -- 6.2 Sensitivity Study -- 6.3 Compilation Overhead -- 7 Related Work -- 8 Conclusion and Future Work -- References -- Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores -- 1 Introduction -- 2 The OSCAR Automatic Parallelizing Compiler -- 3 Investigated Multicore Architectures -- 4 Benchmark Programs -- 5 Compile Flow -- 6 Performance of OSCAR Compiler-Parallelized Programs -- 6.1 OSCAR Compiled Benchmark Performance on Intel x86.
6.2 OSCAR Compiled Benchmark Performance on AMD X86 -- 6.3 OSCAR Compiled Benchmark Performance on Arm -- 6.4 OSCAR Compiled Benchmark Performance on RISC-V -- 7 Conclusion -- References -- Accelerators -- LC-MEMENTO: A Memory Model for Accelerated Architectures -- 1 Introduction -- 2 Background -- 2.1 Memory Consistency Models -- 2.2 The Abstract Runtime System: ARTS -- 2.3 NVIDIA CUDA Programming and Execution Environment -- 3 LC-MEMENTO Design and Implementation -- 3.1 Asynchronous Runtime Scheduler for Accelerators -- 3.2 Memory Models for Accelerators -- 4 Evaluation -- 4.1 STREAM Benchmark -- 4.2 Random Access Benchmark -- 4.3 Breadth-First Search -- 5 Related Work -- 6 Conclusions and Future Work -- References -- The ORKA-HPC Compiler-Practical OpenMP for FPGAs -- 1 Motivation -- 2 Related Work -- 3 The ORKA-HPC OpenMP-to-FPGA Compiler -- 3.1 OpenMP Lowering -- 3.2 FPGA Path -- 3.3 ORKA-HPC LLP-Backend -- 3.4 Host Path -- 4 Deployment -- 5 Evaluation -- 6 Contributions and Future Work -- References -- Graphs and Kernels -- Optimizing Sparse Matrix Multiplications for Graph Neural Networks -- 1 Introduction -- 2 Background -- 2.1 Graph Neural Networks -- 2.2 Sparse Matrix Storage Formats -- 3 Motivation -- 3.1 Setup -- 3.2 Results -- 4 Our Approach -- 4.1 Predictive Modeling -- 4.2 Problem Modeling -- 4.3 Training Data Generation -- 4.4 Feature Engineering -- 4.5 Training the Model -- 4.6 Using the Model -- 5 Experimental Setup -- 5.1 Software and Hardware -- 5.2 Evaluation Methodology -- 6 Experimental Results -- 6.1 Overall Results -- 6.2 Compare to Prior Methods -- 6.3 Compare to Oracle Performance -- 6.4 Model Analysis -- 6.5 Discussion -- 7 Related Work -- 8 Conclusions -- References -- A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve -- 1 Introduction -- 2 Motivation and Related Work -- 3 Preliminaries.
3.1 Sparse Matrix and Serial SpTS -- 3.2 Parallel SpTS -- 4 Our Approach -- 4.1 Overview -- 4.2 no-busy-wait -- 4.3 busy-wait -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 SpTS Performance Comparison -- 6 Conclusion and Future Work -- References -- Techniques for Managing Polyhedral Dataflow Graphs -- 1 Introduction -- 2 Background -- 2.1 GeoAc -- 2.2 SPF and the Computation API -- 2.3 Polyhedral Dataflow Graphs -- 3 Case Study: Expressing GeoAc and Examining Polyhedral Dataflow Graphs -- 3.1 Approximate Static Single Assignment -- 3.2 Producer Consumer Reductions -- 3.3 Graph Components -- 3.4 Data Dependent Control Flow -- 3.5 Dead Code Elimination -- 3.6 Subgraphs -- 3.7 Constant Size Arrays -- 3.8 Debugging Information -- 4 Related Work -- 5 Conclusion -- References -- Author Index.
Record Nr. UNISA-996464547403316
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors)
Languages and compilers for parallel computing : 34th international workshop, LCPC 2021, Newark, DE, USA, October 13-14, 2021 : revised selected papers / / Xiaoming Li and Sunita Chandrasekaran (editors)
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (159 pages)
Disciplina 004.35
Collana Lecture notes in computer science
Soggetto topico Parallel processing (Electronic computers)
Parallel programming (Computer science)
Compilers (Computer programs)
ISBN 3-030-99372-8
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Compiler -- Locality-Based Optimizations in the Chapel Compiler -- 1 Introduction -- 2 Chapel Background -- 2.1 Distributed Arrays -- 2.2 Forall Loops -- 3 Compiler Analysis and Optimizations -- 3.1 Automatic Local Access -- 3.2 Automatic Aggregation -- 4 Results -- 5 Future Work -- 6 Related Work -- 7 Conclusion -- References -- iCetus: A Semi-automatic Parallel Programming Assistant -- 1 Introduction -- 2 Rationale for the iCetus Interactive Parallelizer and Tool Features -- 2.1 Automatic Parallelization in Cetus -- 2.2 The Opportunity of Interactive Parallelization -- 2.3 iCetus Features -- 2.4 Limitations of the Current Version of iCetus -- 3 iCetus System Overview -- 4 Evaluation -- 4.1 Importance and Usefulness of Existing iCetus Features -- 4.2 Importance and Usefulness of our Proposed iCetus Features -- 4.3 Requested Features for iCetus -- 5 Related Work -- 6 Conclusion -- References -- Hybrid Register Allocation with Spill Cost and Pattern Guided Optimization -- 1 Introduction -- 2 Background and Challenges -- 3 Preliminary Analysis -- 4 Design and Implementation -- 4.1 Code Pattern Recognizer -- 4.2 Spill Cost Tracking Mechanism -- 4.3 Putting it all Together: Cost-Guided Allocation Optimizer -- 5 Methodology -- 6 Evaluation Result -- 6.1 Benchmark Performance -- 6.2 Sensitivity Study -- 6.3 Compilation Overhead -- 7 Related Work -- 8 Conclusion and Future Work -- References -- Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores -- 1 Introduction -- 2 The OSCAR Automatic Parallelizing Compiler -- 3 Investigated Multicore Architectures -- 4 Benchmark Programs -- 5 Compile Flow -- 6 Performance of OSCAR Compiler-Parallelized Programs -- 6.1 OSCAR Compiled Benchmark Performance on Intel x86.
6.2 OSCAR Compiled Benchmark Performance on AMD X86 -- 6.3 OSCAR Compiled Benchmark Performance on Arm -- 6.4 OSCAR Compiled Benchmark Performance on RISC-V -- 7 Conclusion -- References -- Accelerators -- LC-MEMENTO: A Memory Model for Accelerated Architectures -- 1 Introduction -- 2 Background -- 2.1 Memory Consistency Models -- 2.2 The Abstract Runtime System: ARTS -- 2.3 NVIDIA CUDA Programming and Execution Environment -- 3 LC-MEMENTO Design and Implementation -- 3.1 Asynchronous Runtime Scheduler for Accelerators -- 3.2 Memory Models for Accelerators -- 4 Evaluation -- 4.1 STREAM Benchmark -- 4.2 Random Access Benchmark -- 4.3 Breadth-First Search -- 5 Related Work -- 6 Conclusions and Future Work -- References -- The ORKA-HPC Compiler-Practical OpenMP for FPGAs -- 1 Motivation -- 2 Related Work -- 3 The ORKA-HPC OpenMP-to-FPGA Compiler -- 3.1 OpenMP Lowering -- 3.2 FPGA Path -- 3.3 ORKA-HPC LLP-Backend -- 3.4 Host Path -- 4 Deployment -- 5 Evaluation -- 6 Contributions and Future Work -- References -- Graphs and Kernels -- Optimizing Sparse Matrix Multiplications for Graph Neural Networks -- 1 Introduction -- 2 Background -- 2.1 Graph Neural Networks -- 2.2 Sparse Matrix Storage Formats -- 3 Motivation -- 3.1 Setup -- 3.2 Results -- 4 Our Approach -- 4.1 Predictive Modeling -- 4.2 Problem Modeling -- 4.3 Training Data Generation -- 4.4 Feature Engineering -- 4.5 Training the Model -- 4.6 Using the Model -- 5 Experimental Setup -- 5.1 Software and Hardware -- 5.2 Evaluation Methodology -- 6 Experimental Results -- 6.1 Overall Results -- 6.2 Compare to Prior Methods -- 6.3 Compare to Oracle Performance -- 6.4 Model Analysis -- 6.5 Discussion -- 7 Related Work -- 8 Conclusions -- References -- A Hybrid Synchronization Mechanism for Parallel Sparse Triangular Solve -- 1 Introduction -- 2 Motivation and Related Work -- 3 Preliminaries.
3.1 Sparse Matrix and Serial SpTS -- 3.2 Parallel SpTS -- 4 Our Approach -- 4.1 Overview -- 4.2 no-busy-wait -- 4.3 busy-wait -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 SpTS Performance Comparison -- 6 Conclusion and Future Work -- References -- Techniques for Managing Polyhedral Dataflow Graphs -- 1 Introduction -- 2 Background -- 2.1 GeoAc -- 2.2 SPF and the Computation API -- 2.3 Polyhedral Dataflow Graphs -- 3 Case Study: Expressing GeoAc and Examining Polyhedral Dataflow Graphs -- 3.1 Approximate Static Single Assignment -- 3.2 Producer Consumer Reductions -- 3.3 Graph Components -- 3.4 Data Dependent Control Flow -- 3.5 Dead Code Elimination -- 3.6 Subgraphs -- 3.7 Constant Size Arrays -- 3.8 Debugging Information -- 4 Related Work -- 5 Conclusion -- References -- Author Index.
Record Nr. UNINA-9910556898603321
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran
Edizione [1st ed. 2020.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Descrizione fisica 1 online resource (X, 205 p. 289 illus., 70 illus. in color.)
Disciplina 004.11
Collana Communications in Computer and Information Science
Soggetto topico Special purpose computers
Software engineering
Computer input-output equipment
Special Purpose and Application-Based Systems
Software Engineering/Programming and Operating Systems
Computer Hardware
ISBN 3-030-44728-6
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto HUST -- Annual Workshop on HPC User Support Tools. -SE-HER -- International Workshop on Software Engineering for HPC-Enabled Research. - WIHPC – Workshop on Interactive High-Performance Computing.
Record Nr. UNINA-9910410050403321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran
Tools and Techniques for High Performance Computing [[electronic resource] ] : Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers / / edited by Guido Juckeland, Sunita Chandrasekaran
Edizione [1st ed. 2020.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Descrizione fisica 1 online resource (X, 205 p. 289 illus., 70 illus. in color.)
Disciplina 004.11
Collana Communications in Computer and Information Science
Soggetto topico Special purpose computers
Software engineering
Computer input-output equipment
Special Purpose and Application-Based Systems
Software Engineering/Programming and Operating Systems
Computer Hardware
ISBN 3-030-44728-6
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto HUST -- Annual Workshop on HPC User Support Tools. -SE-HER -- International Workshop on Software Engineering for HPC-Enabled Research. - WIHPC – Workshop on Interactive High-Performance Computing.
Record Nr. UNISA-996465461003316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui