top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (157 pages)
Disciplina 005.13
Collana Lecture Notes in Computer Science
Soggetto topico High performance computing
Microprogramming
Computer programming
Càlcul intensiu (Informàtica)
Programació (Ordinadors)
Soggetto genere / forma Congressos
Llibres electrònics
ISBN 3-030-97759-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis.
4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index.
Record Nr. UNISA-996475771803316
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Accelerator programming using directives : 7th international workshop, WACCPD 2020, virtual event, November 20, 2020, proceedings / / edited by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (157 pages)
Disciplina 005.13
Collana Lecture Notes in Computer Science
Soggetto topico High performance computing
Microprogramming
Computer programming
Càlcul intensiu (Informàtica)
Programació (Ordinadors)
Soggetto genere / forma Congressos
Llibres electrònics
ISBN 3-030-97759-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Directive Alternatives -- Can Fortran's `do concurrent' Replace Directives for Accelerated Computing?*-8pt -- 1 Introduction -- 2 Code and Test Description -- 2.1 Code Description -- 2.2 Test Description -- 2.3 Computational Environment -- 2.4 Baseline Performance Results -- 3 Implementation -- 3.1 The Fortran do concurrent construct -- 3.2 Code Versions -- 3.3 Compiler Flag Options -- 4 Results -- 4.1 Results Using nvfortran -- 4.2 Results Using gfortran -- 4.3 Results Using ifort -- 4.4 Experimental Results -- 5 Discussion -- 6 Artifact Availability Statement -- References -- Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation Through SYCL Interoperability -- 1 Introduction -- 1.1 Contribution -- 2 Related Work -- 2.1 Parallel Programming Frameworks -- 2.2 Linear Algebra Libraries -- 2.3 The Proposed Approach -- 3 SYCL Overview -- 4 SYCL-Based RNG Implementations of NVIDIA and AMD GPUs in oneMKL -- 4.1 Technical Aspects -- 4.2 Native cuRAND and hipRAND flow -- 4.3 Implementation of cuRAND and hipRAND in oneMKL -- 5 Benchmark Applications -- 5.1 Random Number Generation Burner -- 5.2 FastCaloSim -- 6 Performance Evaluation -- 6.1 Performance Portability Metrics -- 6.2 Hardware Specifications -- 6.3 Software Specifications -- 7 Results -- 8 Conclusions and Future Work -- References -- Directive Extensions -- Extending OpenMP for Machine Learning-Driven Adaptation -- 1 Introduction -- 2 A Motivating Example -- 3 A Vision -- 4 The declare adaptation Directive -- 4.1 Syntax and Semantics of declare adaptation -- 4.2 Examples Using metadirective -- 5 Implementation -- 5.1 Compiler Support -- 5.2 Runtime Support -- 6 Evaluation -- 6.1 Software and Hardware Configurations -- 6.2 Performance Results -- 6.3 Accuracy of Prediction Models.
6.4 Overhead Analysis -- 7 Related Work -- 8 Conclusion -- References -- Directive Case Studies -- GPU Porting of Scalable Implicit Solver with Green's Function-Based Neural Networks by OpenACC -- 1 Introduction -- 2 Solver with Green's Function-Based NN Preconditioner -- 2.1 Target Problem -- 2.2 GF-Based NN Predictor -- 2.3 Scalable Solver Algorithm Using GF-Based NN Predictor -- 3 GPU Porting of Solver with Green's Function-Based NN Preconditioner Using OpenACC -- 4 Performance Measurement -- 4.1 Problem Used for Measurement -- 4.2 Performance Measurement Environment -- 4.3 Solver Performance on GPU-Based System -- 4.4 Weak Scaling on GPU-Based System -- 5 Closing Remarks -- References -- Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading -- 1 Introduction -- 2 Related Work -- 3 Methods and APIs -- 3.1 Alpaka and PIConGPU -- 3.2 Review of OpenACC and OpenMP Target -- 3.3 Experimental Setup -- 4 Porting Alpaka -- 4.1 Final Touches: PIConGPU -- 5 Major Hurdles and Discussion -- 5.1 Standards Issues -- 5.2 Compiler and Runtime Issues -- 5.3 Preliminary Results -- 6 Conclusions and Outlook -- References -- Accelerating Quantum Many-Body Configuration Interaction with Directives -- 1 Introduction -- 2 Computational Motifs in Configuration Interaction Code MFDn -- 2.1 Matrix Sparsity Determination -- 2.2 Parallel Prefix Sum -- 2.3 Filling Shared Arrays -- 2.4 Array Reductions -- 3 Conclusion and Outlook -- References -- GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP -- 1 Introduction -- 2 Software and Experimental Setup -- 2.1 Experimental Setup -- 2.2 OpenMP GPU Offloading -- 3 The Structure of GEM -- 4 Results and Analysis -- 4.1 Speedup Performance and Roofline Analysis for Single Node -- 4.2 Scalability Analysis.
4.3 Investigation of Hardware Threads -- 5 Discussion -- 6 Summary -- References -- Author Index.
Record Nr. UNINA-9910568267303321
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.)
Disciplina 001.642
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Input-output equipment (Computers)
Microprogramming 
Computer organization
Programming Languages, Compilers, Interpreters
Logic Design
Input/Output and Data Communications
Control Structures and Microprogramming
Computer Systems Organization and Communication Networks
ISBN 3-030-12274-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNINA-9910337577103321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Accelerator Programming Using Directives [[electronic resource] ] : 5th International Workshop, WACCPD 2018, Dallas, TX, USA, November 11-17, 2018, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland, Sandra Wienke
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (IX, 137 p. 61 illus., 43 illus. in color.)
Disciplina 001.642
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Input-output equipment (Computers)
Microprogramming 
Computer organization
Programming Languages, Compilers, Interpreters
Logic Design
Input/Output and Data Communications
Control Structures and Microprogramming
Computer Systems Organization and Communication Networks
ISBN 3-030-12274-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996466464703316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Edizione [1st ed. 2018.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Descrizione fisica 1 online resource (IX, 183 p. 59 illus.)
Disciplina 004.3
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Operating systems (Computers)
Computer programming
Computer organization
Computers
Programming Languages, Compilers, Interpreters
Logic Design
Operating Systems
Programming Techniques
Computer Systems Organization and Communication Networks
Models and Principles
ISBN 3-319-74896-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction.
7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index.
Record Nr. UNISA-996465474703316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Accelerator Programming Using Directives [[electronic resource] ] : 4th International Workshop, WACCPD 2017, Held in Conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 13, 2017, Proceedings / / edited by Sunita Chandrasekaran, Guido Juckeland
Edizione [1st ed. 2018.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Descrizione fisica 1 online resource (IX, 183 p. 59 illus.)
Disciplina 004.3
Collana Programming and Software Engineering
Soggetto topico Programming languages (Electronic computers)
Logic design
Operating systems (Computers)
Computer programming
Computer organization
Computers
Programming Languages, Compilers, Interpreters
Logic Design
Operating Systems
Programming Techniques
Computer Systems Organization and Communication Networks
Models and Principles
ISBN 3-319-74896-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Applications -- An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC -- Abstract -- 1 Introduction -- 2 Workflow and System Description -- 2.1 Workflow -- 2.2 System -- 3 Results and Discussion -- 3.1 Profiling with Score-P -- 3.2 The Most Expensive Kernel: MatMult_SeqAIJ -- 3.3 Four Steps Toward the Final Version of OpenACC Kernel -- 4 Speedups and Strong Scaling -- 5 Conclusion -- Acknowledgement -- References -- Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model -- 1 Introduction -- 1.1 ASUCA on GPU -- 1.2 Parallelization Granularity -- 1.3 Memory Layout -- 1.4 Related Work -- 1.5 Problem Summary -- 2 Hybrid Fortran Language Extension and Code Transformation -- 2.1 Parallel Loop Abstraction -- 2.2 Compile-Time Defined Memory Layout and Device Data Region -- 2.3 Transformed Code -- 3 Code Transformation Method -- 4 Productivity- and Performance Results -- 5 Conclusion and Future Work -- References -- Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC -- 1 Introduction -- 2 Finite-Element Earthquake Simulation Designed for the K Computer -- 3 Proposed Solver for GPUs Using OpenACC -- 3.1 Modification of Algorithm for GPUs -- 3.2 Introduction of OpenACC -- 4 Performance Measurements -- 5 Application Example -- 6 Concluding Remarks -- References -- Runtime Environments -- The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer -- 1 Introduction -- 2 RAJA -- 2.1 Basic Execution Policies -- 2.2 RAJA::NestedPolicy and Loop Transformations -- 3 Embedding Directives in the C++ Type System -- 3.1 Defining Policy Tags for a Backend -- 3.2 Constructing Explicit Execution Policy Types.
3.3 Implement forall Specializations -- 4 Case Study: OpenMP 4.5 -- 5 Case Study: OpenACC -- 6 Evaluation -- 6.1 Test Set -- 6.2 Goals and Non-Goals -- 6.3 Compilation Overhead -- 6.4 Runtime Overhead -- 7 Future Work and Conclusion -- References -- Enabling GPU Support for the COMPSs-Mobile Framework -- 1 Introduction -- 2 Related Work -- 3 Programming Model -- 3.1 Extension for GPU Support -- 4 Runtime Support Implementation -- 4.1 COMPSs-Mobile Runtime Architecture -- 4.2 OpenCL Platform -- 5 Performance Evaluation -- 5.1 OpenCL Platform Performance -- 5.2 Load Balancing Policies -- 6 Conclusions and Future Work -- References -- Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP -- Abstract -- 1 Introduction -- 2 MBFLO3 Application -- 2.1 Mathematical Formulation -- 2.2 Numerical Method -- 3 Heterogeneous Multiblock Computing Strategy -- 3.1 Multicore Host Parallelism -- 3.2 Manycore Accelerator Parallelism -- 3.3 Heterogeneous Host-Device Parallelism -- 4 Performance Results and Analysis -- 5 Conclusions -- Acknowledgements -- References -- Program Evaluation -- Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs -- 1 Introduction -- 2 Motivation -- 3 Compiling Java to GPUs -- 3.1 Java Parallel Stream API -- 3.2 JIT Compilation for GPUs -- 4 Exploring Supervised Machine Learning Algorithms -- 4.1 Supervised Machine Learning -- 4.2 Generating Subsets of Features -- 4.3 Constructing Prediction Models -- 4.4 Integrating Prediction Models -- 5 Experimental Results -- 5.1 Experimental Protocol -- 5.2 Overall Summary -- 5.3 Accuracies on the Full Set of Features -- 5.4 Exploring ML Algorithms by Feature Subsetting -- 5.5 Lessons Learned -- 6 Related Work -- 6.1 GPU Code Generation from High-Level Languages -- 6.2 Offline Model Construction.
7 Conclusions -- A Appendix -- References -- Automatic Testing of OpenACC Applications -- 1 Introduction -- 2 Testing a GPU Port of a Numerical Application -- 3 Autocompare with OpenACC -- 4 Autocompare Implementation -- 5 Experiments -- 6 Related Work -- 7 Future Work -- 8 Conclusion -- References -- Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices -- 1 Introduction -- 2 Related Work -- 3 Accelerator Programming Models -- 3.1 CUDA -- 3.2 OpenCL -- 3.3 OpenACC -- 3.4 OpenMP -- 4 Implementing the Conjugate Gradient Method -- 5 Performance Results on NVIDIA GPUs -- 5.1 Data Transfers with the Host -- 5.2 Single Device -- 5.3 Two Devices -- 6 Performance Results on Intel Xeon Phi Coprocessors -- 6.1 Single Device -- 6.2 Two Devices -- 7 Summary -- References -- Author Index.
Record Nr. UNINA-9910349261003321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2018
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
High Performance Computing [[electronic resource] ] : 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings / / edited by Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief
High Performance Computing [[electronic resource] ] : 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings / / edited by Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief
Edizione [1st ed. 2020.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Descrizione fisica 1 online resource (563 pages)
Disciplina 004.3
Collana Theoretical Computer Science and General Issues
Soggetto topico Software engineering
Computer engineering
Computer networks
Computers
Artificial intelligence
Software Engineering
Computer Engineering and Networks
Computer Hardware
Computer Communication Networks
Artificial Intelligence
ISBN 3-030-50743-2
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Architectures, Networks & Infrastructure -- Artificial Intelligence and Machine Learning -- Data, Storage & Visualization -- Emerging Technologies -- HPC Algorithms -- HPC Applications -- Performance Modeling & Measurement -- Programming Models & Systems Software.
Record Nr. UNISA-996418316103316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
High Performance Computing [[electronic resource] ] : 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings / / edited by Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief
High Performance Computing [[electronic resource] ] : 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings / / edited by Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief
Edizione [1st ed. 2020.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Descrizione fisica 1 online resource (563 pages)
Disciplina 004.3
Collana Theoretical Computer Science and General Issues
Soggetto topico Software engineering
Computer engineering
Computer networks
Computers
Artificial intelligence
Software Engineering
Computer Engineering and Networks
Computer Hardware
Computer Communication Networks
Artificial Intelligence
ISBN 3-030-50743-2
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Architectures, Networks & Infrastructure -- Artificial Intelligence and Machine Learning -- Data, Storage & Visualization -- Emerging Technologies -- HPC Algorithms -- HPC Applications -- Performance Modeling & Measurement -- Programming Models & Systems Software.
Record Nr. UNINA-9910410054203321
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2020
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
High Performance Computing [[electronic resource] ] : ISC High Performance 2019 International Workshops, Frankfurt, Germany, June 16-20, 2019, Revised Selected Papers / / edited by Michèle Weiland, Guido Juckeland, Sadaf Alam, Heike Jagode
High Performance Computing [[electronic resource] ] : ISC High Performance 2019 International Workshops, Frankfurt, Germany, June 16-20, 2019, Revised Selected Papers / / edited by Michèle Weiland, Guido Juckeland, Sadaf Alam, Heike Jagode
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (XXV, 659 p. 402 illus., 239 illus. in color.)
Disciplina 004.3
Collana Theoretical Computer Science and General Issues
Soggetto topico Computer engineering
Computer networks
Software engineering
Computers
Computer Engineering and Networks
Software Engineering
Computer Hardware
Computing Milieux
ISBN 3-030-34356-1
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Short Papers -- Preface to the First International Workshop on Legacy Software Refactoring for Performance -- P^3MA Workshop 2019 -- 4th International Workshop on In Situ Visualization (WOIV'19) -- Contents -- On the Use of Kernel Bypass Mechanisms for High-Performance Inter-container Communications -- 1 Introduction -- 2 Overview of Compared Solutions -- 3 Experimental Results -- 4 Related Work -- 5 Conclusions and Future Work -- References -- Continuous-Action Reinforcement Learning for Memory Allocation in Virtualized Servers -- 1 Introduction -- 2 Background -- 2.1 Memory Management in Virtualized Nodes -- 2.2 Reinforcement Learning: Markov Decision Process -- 3 CAVMem: Algorithm for Virtualized Memory Management -- 3.1 Decentralized Strategy for Memory Management -- 3.2 Formulating the Problem as an MDP -- 4 Experimental Framework -- 5 Results for Evaluation -- 5.1 Results for Scenario 1 -- 5.2 Results for Scenario 2 -- 5.3 Results for Scenario 3 -- 5.4 Discussion -- 6 Related Work -- 7 Conclusions and Future Work -- References -- Container Orchestration on HPC Clusters -- 1 Introduction -- 2 Related Work -- 3 Background -- 3.1 Kubernetes -- 3.2 Kubernetes Deployment -- 4 Implementation -- 4.1 General Approach -- 4.2 Kubernetes Cluster Deployment -- 4.3 HPC Worker Node Software Prerequisites -- 4.4 Networking -- 4.5 GE Worker Setup and Tear down -- 4.6 Kubernetes Cluster Configuration -- 5 Evaluation -- 6 Discussion -- 7 Conclusion and Future Work -- References -- Data Pallets: Containerizing Storage for Reproducibility and Traceability -- 1 Introduction -- 2 Related Work -- 3 Design -- 3.1 Design and Implementation Challenges -- 3.2 Design and Implementation Details -- 3.3 Integration with Sandia Analysis Workbench (SAW) -- 4 Measurements -- 4.1 Time Overheads -- 4.2 Space Overheads -- 4.3 Discussion.
5 Integration with Sandia Analysis Workbench -- 6 Conclusions and Future Work -- References -- Sarus: Highly Scalable Docker Containers for HPC Systems -- 1 Introduction -- 2 Related Work -- 3 Sarus -- 3.1 Sarus Architecture -- 3.2 Container Creation -- 4 Extending Sarus with OCI Hooks -- 4.1 Native MPICH-Based MPI Support (H1) -- 4.2 NVIDIA GPU Support (H2) -- 4.3 SSH Connection Within Containers (H3) -- 4.4 Slurm Scheduler Synchronization (H4) -- 5 Performance Evaluation -- 5.1 Scientific Applications -- 6 Conclusions -- References -- Singularity GPU Containers Execution on HPC Cluster -- 1 Introduction -- 2 Singularity GPU Containers Building and Running -- 3 Benchmark -- 3.1 Systems Description -- 3.2 Test Case 1: Containerized Tensorflow Execution on GALILEO Versus Official Tensorflow Performance Data -- 3.3 Test Case 2: Containerized Versus Bare Metal Execution on GALILEO -- 4 Conclusion -- References -- A Multitenant Container Platform with OKD, Harbor Registry and ELK -- 1 Introduction -- 2 Past -- 2.1 Background -- 2.2 Challenges -- 3 Present -- 3.1 Evaluation of Container Orchestration Frameworks -- 3.2 Observability: Logging and OKD -- 3.3 Observability: Monitoring and OKD -- 4 Future -- 4.1 Monitoring -- 4.2 Container Policy and OKD -- 4.3 Gitops gitops and OKD -- 4.4 Continuous Delivery in OKD -- 4.5 OKD in the Cloud -- 5 Conclusion -- References -- Enabling GPU-Enhanced Computer Vision and Machine Learning Research Using Containers -- 1 Introduction -- 2 Defining the Base Container -- 2.1 System Setup: Ubuntu, CUDA, Docker, Nvidia-Docker -- 2.2 Docker and Container Runtime -- 2.3 TensorFlow -- 2.4 OpenCV -- 2.5 Cuda_tensorflow_opencv -- 3 Using the Base Container -- 3.1 Testing Code from a Bash Terminal -- 3.2 Integrating Darknet and Yolo V3 Python Bindings -- 4 Conclusion -- References.
Software and Hardware Co-design for Low-Power HPC Platforms -- 1 Introduction -- 2 Network Interface Primitives -- 3 HPC Prototype -- 4 User-Level Communication Library -- 5 MPI Implementation over the Proposed Architecture -- 6 Conclusions and Future Work -- References -- Modernizing Titan2D, a Parallel AMR Geophysical Flow Code to Support Multiple Rheologies and Extendability -- 1 Introduction -- 2 Titan2D and Benchmark Problem -- 3 Refactoring Strategies -- 3.1 Adopting a Python Interface -- 3.2 Merging Multiple Forks -- 3.3 Changing Data Layout to for Modern CPU Architectures -- 3.4 Efficient Indexing for Elements/Nodes Addressing -- 3.5 Introducing OpenMP and Hybrid OpenMP/MPI Parallelization -- 4 Performance Improvement Evaluation -- 5 Conclusions and Future Plans -- References -- Asynchronous AMR on Multi-GPUs -- 1 Introduction -- 2 Execution on Heterogeneous Architectures -- 2.1 Data Model and CPU-GPU Communication -- 2.2 Scheduling on Heterogeneous Architectures -- 2.3 API -- 2.4 Multi-GPU Support -- 3 Evaluation -- 4 Conclusions -- References -- Batch Solution of Small PDEs with the OPS DSL -- 1 Introduction -- 2 The OPS DSL -- 3 Batching Support in OPS -- 3.1 Extending the Abstraction -- 3.2 Execution Schedule Transformation -- 3.3 Data Layout Transformation -- 3.4 Alternating Direction Implicit Solver -- 4 Evaluation -- 4.1 The Application -- 4.2 Experimental Set-Up -- 4.3 Results -- 5 Conclusions -- References -- Scalable Parallelization of Stencils Using MODA -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 MODA and User-Defined Indices -- 3.2 Using GGDML Indices -- 3.3 Communication Identification -- 4 Evaluation -- 4.1 Test Application -- 4.2 Test System -- 4.3 Experiments -- 5 Summary -- References -- Comparing High Performance Computing Accelerator Programming Models -- 1 Introduction -- 2 Motivation -- 3 Related Work.
4 Analysis -- 5 Discussion -- 5.1 BT Benchmark -- 5.2 SP Benchmark -- 5.3 LBM Benchmark -- 5.4 LBDC Benchmark -- 6 Conclusion -- References -- Tracking User-Perceived I/O Slowdown via Probing -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Probing -- 3.2 Data Reduction Using Statistics -- 3.3 Computing the Slowdown -- 4 Evaluation -- 4.1 Test Systems -- 4.2 Probing Tool -- 4.3 Timeseries of Individual Measurements -- 4.4 Host Variability -- 4.5 Understanding Application Behavior - The IO-500 -- 4.6 Long-Period -- 4.7 Slowdown -- 5 Conclusion -- References -- A Quantitative Approach to Architecting All-Flash Lustre File Systems -- 1 Introduction -- 2 Methods -- 3 File System Capacity -- 4 Drive Endurance -- 5 Metadata Configuration -- 5.1 MDT Capacity Required by DOM -- 5.2 MDT Capacity Required for Inodes -- 5.3 Overall MDT Capacity -- 6 Conclusion -- References -- MBWU: Benefit Quantification for Data Access Function Offloading -- 1 Introduction -- 2 The MBWU-Based Methodology -- 2.1 Background -- 2.2 What Is MBWU -- 2.3 How to Measure MBWU(s) -- 2.4 Evaluation Prototype -- 3 Evaluation -- 3.1 Infrastructure -- 3.2 Test Setup and Results -- 4 Related Work -- 5 Conclusion -- References -- Footprinting Parallel I/O - Machine Learning to Classify Application's I/O Behavior -- 1 Introduction -- 2 Related Work -- 3 DKRZ Monitoring -- 3.1 Metrics -- 4 Methodology -- 5 Test Data -- 5.1 Data Preparation -- 6 Evaluation -- 6.1 I/O Behavior Classification -- 6.2 Footprinting -- 7 Manual Identification of I/O Intensive Jobs -- 8 Summary and Conclusion -- References -- Adventures in NoSQL for Metadata Management -- 1 Introduction -- 2 Related Work -- 3 Metadata Model -- 3.1 Basic Metadata -- 3.2 Custom Metadata -- 4 Design -- 4.1 What Has the Right Features to Be Worth Testing? -- 4.2 What Is It Going to Take to Get It All Working at All?.
4.3 Can We Make Our Queries Work with Any Performance? -- 4.4 Battle Scars and Lessons for Our Next Battle Against Scale Out Computing Tools -- 5 Evaluation -- 5.1 Insert Time -- 5.2 Query Time -- 6 Conclusion and Future Work -- References -- Towards High Performance Data Analytics for Climate Change -- 1 Introduction -- 2 Main Challenges -- 3 The Ophidia Project -- 3.1 Multi-dimensional Storage Model -- 3.2 Array-Based Primitives and Parallel Operators -- 4 Benchmark and Experimental Results -- 4.1 Benchmark Definition -- 4.2 Test Environment -- 4.3 Experimental Results and Discussion -- 5 Related Work -- 6 Conclusions -- References -- An Architecture for High Performance Computing and Data Systems Using Byte-Addressable Persistent Memory -- 1 Introduction -- 2 Persistent Memory -- 2.1 Data Access -- 2.2 B-APM Modes of Operation -- 2.3 Non-volatile Memory Software Ecosystem -- 3 Opportunities for Exploiting B-APM for Computational Simulations and Data Analytics -- 3.1 Potential Caveats -- 4 Systemware Architecture -- 4.1 Job Scheduler -- 4.2 Data Scheduler -- 5 Performance Evaluation -- 6 Related Work -- 7 Summary -- References -- Mediating Data Center Storage Diversity in HPC Applications with FAODEL -- 1 Introduction -- 2 FAODEL Background -- 2.1 Kelpie -- 2.2 I/O Management (IOM) Modules -- 3 Mediating Storage Using Kelpie Object Naming -- 3.1 Kelpie Architectural Considerations -- 3.2 Annotating the Kelpie Namespace -- 3.3 Service-Initiated Mediation -- 3.4 Performance Considerations -- 4 Related Work -- 5 Conclusion -- References -- Predicting File Lifetimes with Machine Learning -- 1 Introduction -- 2 Specifying the Problem and Building the Models -- 2.1 Problem Specification -- 2.2 Dataset -- 2.3 Data Preprocessing -- 2.4 Models -- 3 Results -- 3.1 Evaluation Methodology -- 3.2 Training Times and Model Sizes -- 3.3 Accuracy.
3.4 Error and Accuracy Distribution.
Record Nr. UNISA-996466292803316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
High Performance Computing [[electronic resource] ] : 34th International Conference, ISC High Performance 2019, Frankfurt/Main, Germany, June 16–20, 2019, Proceedings / / edited by Michèle Weiland, Guido Juckeland, Carsten Trinitis, Ponnuswamy Sadayappan
High Performance Computing [[electronic resource] ] : 34th International Conference, ISC High Performance 2019, Frankfurt/Main, Germany, June 16–20, 2019, Proceedings / / edited by Michèle Weiland, Guido Juckeland, Carsten Trinitis, Ponnuswamy Sadayappan
Edizione [1st ed. 2019.]
Pubbl/distr/stampa Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Descrizione fisica 1 online resource (XVI, 352 p. 512 illus., 113 illus. in color.)
Disciplina 004.3
Collana Theoretical Computer Science and General Issues
Soggetto topico Software engineering
Logic design
Microprocessors
Computer architecture
Artificial intelligence
Computer networks
Software Engineering
Logic Design
Processor Architectures
Artificial Intelligence
Computer Communication Networks
ISBN 3-030-20656-4
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996466325603316
Cham : , : Springer International Publishing : , : Imprint : Springer, , 2019
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui