top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Asynchronous Many-Task Systems and Applications : Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14-16, 2024, Proceedings
Asynchronous Many-Task Systems and Applications : Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14-16, 2024, Proceedings
Autore Diehl Patrick
Edizione [1st ed.]
Pubbl/distr/stampa Cham : , : Springer International Publishing AG, , 2024
Descrizione fisica 1 online resource (196 pages)
Altri autori (Persone) SchuchartJoseph
Valero-LaraPedro
BosilcaGeorge
Collana Lecture Notes in Computer Science Series
ISBN 3-031-61763-0
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Speaking Pygion: Experiences Writing an Exascale Single Particle Imaging Code -- 1 Introduction -- 2 Related Work -- 3 SpiniFEL -- 4 Pygion Implementation -- 5 Results -- 6 Conclusion -- References -- Futures for Dynamic Dependencies - Parallelizing the H-LU Factorization -- 1 Introduction -- 2 Background -- 3 Future-Based Algorithm -- 4 Definition of Futures -- 5 Pseudocode and Discussion -- 6 Related Work -- 7 Conclusion -- References -- Evaluating PaRSEC Through Matrix Computations in Scientific Applications -- 1 Introduction -- 2 Related Work -- 3 The PaRSEC Runtime System -- 4 Applications as Testbed -- 5 Performance Results and Analysis -- 5.1 Experimental Settings -- 5.2 Load Balancing -- 5.3 GPU Efficiency -- 5.4 Scalability -- 6 Conclusion and Future Work -- References -- Distributed Asynchronous Contact Mechanics with DARMA/vt -- 1 Introduction -- 2 Prior Work -- 3 Algorithm -- 3.1 Update and Tree Build -- 3.2 Broadphase -- 3.3 Midphase and Ghosting -- 3.4 Narrowphase -- 3.5 Load Balancing -- 4 Results -- 5 Conclusion -- References -- IRIS Reimagined: Advancements in Intelligent Runtime System for Task-Based Programming -- 1 Introduction -- 2 Background: IRIS -- 3 Related Work -- 4 IRIS Re-imagined -- 4.1 Vendor-Specific Kernels -- 4.2 Foreign Function Interface (FFI) -- 4.3 Distributed Data Memory Management (DMEM) -- 4.4 Heterogeneous Build Environment for IRIS Applications -- 4.5 Hunter -- 4.6 DAGGER -- 5 Results -- 5.1 FFI -- 5.2 DMEM -- 5.3 DAGGER -- 6 Conclusion -- References -- MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky) -- 1 Introduction -- 2 Background: Cholesky Decomposition, IRIS, and MatRIS -- 2.1 Cholesky Decomposition -- 2.2 IRIS -- 2.3 MatRIS -- 3 Related Work -- 4 Cholesky Decomposition in MatRIS.
4.1 Abstractions for Memory and Computation -- 4.2 Kernel APIs for Cholesky -- 4.3 Tiled Cholesky in MatRIS -- 5 Experiments -- 5.1 Portability, Scalability, and Utilization of Cholesky -- 5.2 Multi-GPU Scalability of Cholesky -- 5.3 Comparison of Cholesky with Vendor Libraries -- 5.4 Heterogeneous Scheduling Opportunities -- 6 Conclusion -- References -- ParSweet: A Suite of Codes for Benchmarking and Testing Mutex-Based Parallel Systems -- 1 Introduction -- 2 Mutex Implementations -- 3 Parallel Codes -- 3.1 Sets -- 3.2 Maps -- 4 Benchmarks and Tests -- 4.1 Machines -- 4.2 Lock Benchmark -- 4.3 Set Benchmark (SetByLock) -- 4.4 Map Benchmark (MapByLock) -- 5 Results -- 5.1 Locks -- 5.2 SetByLocks -- 5.3 MapByLocks -- 6 Conclusion and Future Work -- References -- Rethinking Programming Paradigms in the QC-HPC Context -- 1 Introduction -- 2 Quantum Programming Tools -- 3 Task Modeling in Quantum Computation -- 4 Perspective on the Role of Quantum Technology -- References -- Dynamic Tuning of Core Counts to Maximize Performance in Object-Based Runtime Systems -- 1 Introduction -- 2 Backgroud and Implementation -- 2.1 Implementation of Tuning Core Counts in Charm++ -- 2.2 AdditionalChanges to Charm++ features -- 2.3 Turning Cores Off Without Suspending -- 2.4 Programming API -- 3 Evaluation -- 3.1 System and Benchmarks -- 3.2 Tuning Physical/virtual Core Count for Performance (and Energy and Power Savings) -- 3.3 Overheads -- 4 Related Work -- 5 Conclusion and Future Work -- References -- Enhancing Sparse Direct Solver Scalability Through Runtime System Automatic Data Partition -- 1 Introduction -- 2 Task-Based Sparse Factorization -- 3 Implementation Within the PaStiX Solver -- 4 Experiments -- 5 Related Work -- 6 Conclusion -- References -- Experiences Porting Shared and Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-Study.
1 Introduction -- 2 Related Work -- 3 Methods -- 3.1 Fast Fourier Transform -- 3.2 Parallelization -- 3.3 Different Implementations -- 4 Software Framework -- 4.1 HPX -- 4.2 FFTW -- 5 Results -- 5.1 Overheads -- 5.2 FFTW Backend -- 5.3 Distributed -- 6 Conclusion and Outlook -- References -- An Abstraction for Distributed Stencil Computations Using Charm++ -- 1 Introduction -- 2 Background -- 3 Methodology -- 3.1 Frontend -- 3.2 Backend -- 4 Performance Results -- 5 Related Work -- 6 Future Work -- 7 Conclusion -- References -- DLA-Future: A Task-Based Linear Algebra Library Which Provides a GPU-Enabled Distributed Eigensolver -- 1 Introduction -- 2 DLA-Future -- 2.1 Eigensolver Implementation Description -- 2.2 Implementation Challenges -- 3 Results -- 3.1 Eigensolver -- 3.2 Integration in CP2K -- 4 Conclusion -- References -- ALPI: Enhancing Portability and Interoperability of Task-Aware Libraries -- 1 Introduction -- 2 Background -- 2.1 Task-Based Runtime Systems -- 2.2 Task-Aware Libraries -- 3 The ALPI Interface -- 4 Implementing TAMPI Using the ALPI Interface -- 5 Interoperability Between TA-X Libraries -- 6 Conclusions -- References -- Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at Runtime -- 1 Introduction -- 2 Background -- 3 Evolving APGAS Programs -- 3.1 Lifecycle -- 3.2 Programmer Abstractions -- 3.3 Heuristics -- 3.4 Example: GLB Library -- 4 Evaluation -- 4.1 EvoTree Benchmark -- 4.2 Experiments -- 5 Related Work -- 6 Conclusion -- References -- Optimizing Parallel System Efficiency: Dynamic Task Graph Adaptation with Recursive Tasks -- 1 Introduction -- 2 Granularity Challenges Within the STF Model -- 3 Just-in-Time Task Splitting in StarPU -- 4 Study Case: Cholesky Factorisation -- 5 Conclusion -- References.
HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos Using an Astrophysics Application -- 1 Introduction -- 2 Related Work -- 3 Software Stack -- 3.1 Notable Octo-Tiger Dependencies -- 3.2 Octo-Tiger -- 3.3 Build and Dependendency Management -- 4 Workflow -- 4.1 Challenges in Compiling and Running Within Containers -- 5 Performance Differences -- 5.1 Supercomputer Fugaku (A64FX) -- 5.2 DeepBayou -- 6 Conclusion and Outlook -- References -- Author Index.
Record Nr. UNISA-996601561403316
Diehl Patrick  
Cham : , : Springer International Publishing AG, , 2024
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Asynchronous Many-Task Systems and Applications : Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14-16, 2024, Proceedings
Asynchronous Many-Task Systems and Applications : Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14-16, 2024, Proceedings
Autore Diehl Patrick
Edizione [1st ed.]
Pubbl/distr/stampa Cham : , : Springer International Publishing AG, , 2024
Descrizione fisica 1 online resource (196 pages)
Altri autori (Persone) SchuchartJoseph
Valero-LaraPedro
BosilcaGeorge
Collana Lecture Notes in Computer Science Series
ISBN 3-031-61763-0
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- Speaking Pygion: Experiences Writing an Exascale Single Particle Imaging Code -- 1 Introduction -- 2 Related Work -- 3 SpiniFEL -- 4 Pygion Implementation -- 5 Results -- 6 Conclusion -- References -- Futures for Dynamic Dependencies - Parallelizing the H-LU Factorization -- 1 Introduction -- 2 Background -- 3 Future-Based Algorithm -- 4 Definition of Futures -- 5 Pseudocode and Discussion -- 6 Related Work -- 7 Conclusion -- References -- Evaluating PaRSEC Through Matrix Computations in Scientific Applications -- 1 Introduction -- 2 Related Work -- 3 The PaRSEC Runtime System -- 4 Applications as Testbed -- 5 Performance Results and Analysis -- 5.1 Experimental Settings -- 5.2 Load Balancing -- 5.3 GPU Efficiency -- 5.4 Scalability -- 6 Conclusion and Future Work -- References -- Distributed Asynchronous Contact Mechanics with DARMA/vt -- 1 Introduction -- 2 Prior Work -- 3 Algorithm -- 3.1 Update and Tree Build -- 3.2 Broadphase -- 3.3 Midphase and Ghosting -- 3.4 Narrowphase -- 3.5 Load Balancing -- 4 Results -- 5 Conclusion -- References -- IRIS Reimagined: Advancements in Intelligent Runtime System for Task-Based Programming -- 1 Introduction -- 2 Background: IRIS -- 3 Related Work -- 4 IRIS Re-imagined -- 4.1 Vendor-Specific Kernels -- 4.2 Foreign Function Interface (FFI) -- 4.3 Distributed Data Memory Management (DMEM) -- 4.4 Heterogeneous Build Environment for IRIS Applications -- 4.5 Hunter -- 4.6 DAGGER -- 5 Results -- 5.1 FFI -- 5.2 DMEM -- 5.3 DAGGER -- 6 Conclusion -- References -- MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky) -- 1 Introduction -- 2 Background: Cholesky Decomposition, IRIS, and MatRIS -- 2.1 Cholesky Decomposition -- 2.2 IRIS -- 2.3 MatRIS -- 3 Related Work -- 4 Cholesky Decomposition in MatRIS.
4.1 Abstractions for Memory and Computation -- 4.2 Kernel APIs for Cholesky -- 4.3 Tiled Cholesky in MatRIS -- 5 Experiments -- 5.1 Portability, Scalability, and Utilization of Cholesky -- 5.2 Multi-GPU Scalability of Cholesky -- 5.3 Comparison of Cholesky with Vendor Libraries -- 5.4 Heterogeneous Scheduling Opportunities -- 6 Conclusion -- References -- ParSweet: A Suite of Codes for Benchmarking and Testing Mutex-Based Parallel Systems -- 1 Introduction -- 2 Mutex Implementations -- 3 Parallel Codes -- 3.1 Sets -- 3.2 Maps -- 4 Benchmarks and Tests -- 4.1 Machines -- 4.2 Lock Benchmark -- 4.3 Set Benchmark (SetByLock) -- 4.4 Map Benchmark (MapByLock) -- 5 Results -- 5.1 Locks -- 5.2 SetByLocks -- 5.3 MapByLocks -- 6 Conclusion and Future Work -- References -- Rethinking Programming Paradigms in the QC-HPC Context -- 1 Introduction -- 2 Quantum Programming Tools -- 3 Task Modeling in Quantum Computation -- 4 Perspective on the Role of Quantum Technology -- References -- Dynamic Tuning of Core Counts to Maximize Performance in Object-Based Runtime Systems -- 1 Introduction -- 2 Backgroud and Implementation -- 2.1 Implementation of Tuning Core Counts in Charm++ -- 2.2 AdditionalChanges to Charm++ features -- 2.3 Turning Cores Off Without Suspending -- 2.4 Programming API -- 3 Evaluation -- 3.1 System and Benchmarks -- 3.2 Tuning Physical/virtual Core Count for Performance (and Energy and Power Savings) -- 3.3 Overheads -- 4 Related Work -- 5 Conclusion and Future Work -- References -- Enhancing Sparse Direct Solver Scalability Through Runtime System Automatic Data Partition -- 1 Introduction -- 2 Task-Based Sparse Factorization -- 3 Implementation Within the PaStiX Solver -- 4 Experiments -- 5 Related Work -- 6 Conclusion -- References -- Experiences Porting Shared and Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-Study.
1 Introduction -- 2 Related Work -- 3 Methods -- 3.1 Fast Fourier Transform -- 3.2 Parallelization -- 3.3 Different Implementations -- 4 Software Framework -- 4.1 HPX -- 4.2 FFTW -- 5 Results -- 5.1 Overheads -- 5.2 FFTW Backend -- 5.3 Distributed -- 6 Conclusion and Outlook -- References -- An Abstraction for Distributed Stencil Computations Using Charm++ -- 1 Introduction -- 2 Background -- 3 Methodology -- 3.1 Frontend -- 3.2 Backend -- 4 Performance Results -- 5 Related Work -- 6 Future Work -- 7 Conclusion -- References -- DLA-Future: A Task-Based Linear Algebra Library Which Provides a GPU-Enabled Distributed Eigensolver -- 1 Introduction -- 2 DLA-Future -- 2.1 Eigensolver Implementation Description -- 2.2 Implementation Challenges -- 3 Results -- 3.1 Eigensolver -- 3.2 Integration in CP2K -- 4 Conclusion -- References -- ALPI: Enhancing Portability and Interoperability of Task-Aware Libraries -- 1 Introduction -- 2 Background -- 2.1 Task-Based Runtime Systems -- 2.2 Task-Aware Libraries -- 3 The ALPI Interface -- 4 Implementing TAMPI Using the ALPI Interface -- 5 Interoperability Between TA-X Libraries -- 6 Conclusions -- References -- Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at Runtime -- 1 Introduction -- 2 Background -- 3 Evolving APGAS Programs -- 3.1 Lifecycle -- 3.2 Programmer Abstractions -- 3.3 Heuristics -- 3.4 Example: GLB Library -- 4 Evaluation -- 4.1 EvoTree Benchmark -- 4.2 Experiments -- 5 Related Work -- 6 Conclusion -- References -- Optimizing Parallel System Efficiency: Dynamic Task Graph Adaptation with Recursive Tasks -- 1 Introduction -- 2 Granularity Challenges Within the STF Model -- 3 Just-in-Time Task Splitting in StarPU -- 4 Study Case: Cholesky Factorisation -- 5 Conclusion -- References.
HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos Using an Astrophysics Application -- 1 Introduction -- 2 Related Work -- 3 Software Stack -- 3.1 Notable Octo-Tiger Dependencies -- 3.2 Octo-Tiger -- 3.3 Build and Dependendency Management -- 4 Workflow -- 4.1 Challenges in Compiling and Running Within Containers -- 5 Performance Differences -- 5.1 Supercomputer Fugaku (A64FX) -- 5.2 DeepBayou -- 6 Conclusion and Outlook -- References -- Author Index.
Record Nr. UNINA-9910865250303321
Diehl Patrick  
Cham : , : Springer International Publishing AG, , 2024
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Parallel C++ : Efficient and Scalable High-Performance Parallel Programming Using HPX
Parallel C++ : Efficient and Scalable High-Performance Parallel Programming Using HPX
Autore Diehl Patrick
Edizione [1st ed.]
Pubbl/distr/stampa Cham : , : Springer International Publishing AG, , 2024
Descrizione fisica 1 online resource (233 pages)
Altri autori (Persone) BrandtSteven R
KaiserHartmut
ISBN 9783031543692
9783031543685
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Foreword -- Preface -- Acknowledgments -- Contents -- Acronyms -- Part I Preliminaries -- 1 Compiling and Running the Code and Examples in This Book -- 1.1 Using the C++ Explorer -- 1.2 Using CMake and C++ Compiler -- Part II Introduction to C++ and C++ Standard Library -- 2 About C++, C++ Standard, and the C++ Standard Library -- 2.1 Brief History of C++, the C++ Standard, and Parallel Programming -- 2.2 Standard Template Library (STL) and C++ Standard Library -- 2.3 C++ Compilers -- 3 C++ Standard Library -- 3.1 Overview of the C++ Standard Library -- 3.2 Containers -- 3.2.1 Vector -- 3.2.2 List -- 3.2.3 List vs. Vector -- 3.2.4 Array -- 3.2.5 Iterators -- 3.3 Algorithms -- 4 Example Mandelbrot Set and Julia Set -- 4.1 Mandelbrot Set -- 4.2 Julia Set -- 4.3 Single Threaded Implementation of the Mandelbrot Set -- Part III The C++ Standard Library for Concurrency and Parallelism (HPX) -- 5 Why HPX? -- 5.1 Governing Principles -- 5.1.1 Focus on Latency Hiding Insteadof Latency Avoidance -- 5.1.2 Embrace Fine-Grained Parallelism Instead of Heavyweight Threads -- 5.1.3 Rediscover Constraint-Based Synchronization to Replace Global Barriers -- 5.1.4 Adaptive Locality Control Instead of Static Data Distribution -- 5.1.5 Prefer Moving Work to the Data Over Moving Data to the Work -- 5.1.6 Favor Message Driven Computation Over Message Passing -- 6 The C++ Standard Library for Parallelismand Concurrency (HPX) -- 6.1 HPX's Architecture -- 6.2 Applications -- Part IV Parallel Programming -- 7 Parallel Programming -- 7.1 An Overview of Parallel Programming -- 7.2 Race Conditions -- 7.2.1 Mutexes and Deadlocks -- 7.2.2 Atomic Operation -- 7.3 Performance Measurements -- 7.3.1 Amdahl's Law -- 7.3.2 Gustafson's Law -- 7.3.3 Speedup and Parallel Efficiency -- 7.3.4 Weak Scaling and Strong Scaling -- 7.4 Memory Access.
7.5 Parallelism Computer Architectures -- 7.5.1 Pipelined SIMD -- 7.5.2 Single Instruction Multiple Data (SIMD) -- 8 Programming with Low Level Threads -- 8.1 Implementation of the Fractal Sets -- 9 Asynchronous Programming -- 9.1 Advanced Synchronization in HPX -- 9.2 Implementation of the Fractal Sets -- 10 Parallel Algorithms -- 10.1 Parallel Algorithms in HPX -- 10.1.1 Combining Parallel Algorithms and Asynchronous Programming -- 10.1.2 Single Instruction Multiple Data -- 10.2 Additional Parameters for the Execution Policies -- 10.3 Implementation of the Fractal Sets -- 11 Coroutines -- 11.1 Implementation of the Fractal Sets -- 12 Benchmarking the Fractal Set Codes -- Part V Distributed Programming -- 13 Distributed Computing and Programming -- 13.1 Overview of Distributed Programming in C++ and Asynchronous Many Task Systems -- 13.2 Data Distribution -- 13.3 Distributed Input and Output -- 13.4 Serialization -- 13.5 Message Passing -- 13.5.1 Implementation of the Fractal Set Using MPI -- 13.5.2 Implementation of the Fractal SetUsing MPI+OpenMP -- 13.6 Benchmark of MPI and MPI+OpenMP -- 14 Distributed Programming Using HPX -- 14.1 Active Messaging -- 14.1.1 Plain Action -- 14.1.2 Components and Actions -- 14.1.2.1 Components and Component Actions -- 14.1.2.2 Client -- 14.1.2.3 Using Components and Components Actions -- 14.1.2.4 Recap -- 14.1.3 Receiving Topology Information -- 14.2 Serialization -- 15 Examples of Distributed Programming -- 15.1 Distributed Implementation of the Taylor Series of the Natural Logarithm -- 15.2 Distributed Implementation of the Fractal Set -- 15.3 Improved Distributed Implementation of the Fractal Set -- 15.4 Benchmark of the Distributed Implementation of the Fractal Sets -- 16 Some Remarks on MPI+OpenMP and HPX -- Part VI A Showcase for a Portable High Performance Application Using HPX -- 17 Accelerator Cards.
18 Octo-Tiger, a Showcase for a Portable High Performance Application -- 18.1 Synchronous Communication vs. AsynchronousCommunication -- 18.2 Acceleration Support -- 18.3 HPX-Kokkos and Vectorization -- Part VII Conclusion and Outlook -- 19 Conclusion and Outlook -- 19.1 Distributed Programming -- 19.2 Outlook -- Appendix -- A Advanced Topics in C++ -- A.1 Generic Programming -- A.2 Lambda Functions -- A.3 Move Semantics -- A.4 Placement New -- A.5 Smart Pointers -- A.6 Ranges -- B Supplementary Header Files -- C Software and Hardware Documentation -- References -- Glossary -- Index.
Record Nr. UNINA-9910869168003321
Diehl Patrick  
Cham : , : Springer International Publishing AG, , 2024
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui