Vai al contenuto principale della pagina

OpenMP : enabling massive node-level parallelism : 17th international workshop on OpenMP, IWOMP 2021, Bristol, UK, September 14-16, 2021 : proceedings / / Simon McIntosh-Smith, Bronis R. de Supinski, Jannis Klinkenberg



(Visualizza in formato marc)    (Visualizza in BIBFRAME)

Autore: McIntosh-Smith Simon Visualizza persona
Titolo: OpenMP : enabling massive node-level parallelism : 17th international workshop on OpenMP, IWOMP 2021, Bristol, UK, September 14-16, 2021 : proceedings / / Simon McIntosh-Smith, Bronis R. de Supinski, Jannis Klinkenberg Visualizza cluster
Pubblicazione: Cham, Switzerland : , : Springer International Publishing, , [2021]
©2021
Descrizione fisica: 1 online resource (231 pages)
Disciplina: 621.3916
Soggetto topico: Microprocessors - Computer-aided design
Logic design - Data processing
Persona (resp. second.): De SupinskiBronis R.
KlinkenbergJannis
Nota di contenuto: Intro -- Preface -- Organization -- Contents -- Synchronization and Data -- Improving Speculative taskloop in Hardware Transactional Memory -- 1 Introduction -- 2 Background and Related Work -- 2.1 Task-Based Parallelism -- 2.2 TLS on Hardware Transactional Memories -- 2.3 Speculative taskloop (STL) -- 2.4 Lost-Thread Effect -- 2.5 LLVM OpenMP Runtime Library -- 3 Implementation -- 3.1 First Attempt: Use priority Clause -- 3.2 Recursive Partition of Iterations -- 3.3 Immediate Execution When Deque is Full -- 3.4 Removal from Tail of Thread's Deque -- 4 Benchmarks, Methodology and Experimental Setup -- 5 Experimental Results and Analysis -- 6 Conclusions -- References -- Vectorized Barrier and Reduction in LLVM OpenMP Runtime -- 1 Introduction -- 2 Background and Related Work -- 2.1 Types of Barriers in Literature -- 2.2 Barriers and Reductions in OpenMP -- 3 Low Overhead Barrier and Reduction in OpenMP -- 3.1 Vectorized Barrier -- 3.2 Vectorized Reduction -- 4 Performance Results -- 4.1 Intel KNL -- 4.2 Fujitsu A64FX -- 5 Conclusions -- References -- Tasking Extensions I -- Enhancing OpenMP Tasking Model: Performance and Portability -- 1 Introduction -- 2 Motivation -- 3 The Taskgraph Model -- 3.1 The taskgraph Mechanism -- 3.2 Syntax of the taskgraph Clause -- 3.3 Semantics of the taskgraph Clause -- 3.4 Requirements of the taskgraph Region -- 4 Projected Results -- 4.1 Potential Performance Gain -- 4.2 The TDG: A Door for Expanding Portability -- 5 Related Work -- 6 Conclusion -- References -- OpenMP Taskloop Dependences -- 1 Introduction -- 2 Tasking Programmability Challenges -- 3 Related Work -- 4 Taskloop with Dependences -- 5 Implementation -- 6 Experiment Results -- 7 Conclusions and Future Work -- References -- Applications -- Outcomes of OpenMP Hackathon: OpenMP Application Experiences with the Offloading Model (Part I).
1 Introduction -- 2 Platforms Used -- 3 Application Experiences -- 3.1 BerkeleyGW -- 3.2 WDMApp -- References -- Outcomes of OpenMP Hackathon: OpenMP Application Experiences with the Offloading Model (Part II) -- 1 Introduction -- 2 Application Experiences -- 2.1 GAMESS -- 2.2 GESTS -- 2.3 GridMini -- 3 Conclusions -- References -- An Empirical Investigation of OpenMP Based Implementation of Simplex Algorithm -- 1 Introduction -- 2 Serial Algorithm -- 3 Parallel Algorithm -- 3.1 Implementation -- 3.2 Optimization Strategies -- 3.3 Algorithm Analysis -- 4 Experimental Results and Observations -- 4.1 NETLIB Dataset -- 4.2 Variation of the Number of Variables -- 4.3 Variation of the Number of Constraints -- 4.4 Variation in Matrix Density -- 4.5 Discussion -- 5 Conclusion -- A Appendix: Serial Algorithm - Working Example -- References -- Task Inefficiency Patterns for a Wave Equation Solver -- 1 Introduction -- 2 Case Studies -- 3 Test Environment -- 4 Benchmarking and Task Runtime Modifications -- 4.1 Direct Translation of Enclave Tasking to OpenMP (native) -- 4.2 Manual Task Postponing (Hold-Back) -- 4.3 Manual Backfilling (Backfill) -- 5 Evaluation and Conclusion -- References -- Case Studies -- Comparing OpenMP Implementations with Applications Across A64FX Platforms -- 1 Introduction -- 1.1 The A64FX Processor -- 1.2 Paper's Contribution and Organization -- 2 List of Applications and Experimental Setup -- 2.1 List of Applications -- 2.2 Systems and Compilers -- 2.3 Runtime Environment -- 2.4 Compiler Options -- 3 Experimental Results -- 3.1 Ookami -- 3.2 Fugaku -- 4 Related Work -- 5 Conclusions and Future Work -- References -- A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation -- 1 Introduction -- 2 Case Study: Porting DCA++ to Wombat -- 2.1 Evaluation Environment -- 2.2 DCA++ -- 2.3 Baseline Performance.
3 An LLVM Tool Methodology to Generate Efficient Vectorization -- 3.1 OpenMP SIMD -- 3.2 Using the Correct Compiler Flags -- 3.3 Loop Transformations -- 3.4 Results -- 4 Automating the Process: The OpenMP Advisor -- 5 Related Work -- 6 Conclusion -- References -- Heterogenous Computing and Memory -- Experience Report: Writing a Portable GPU Runtime with OpenMP 5.1 -- 1 Introduction -- 2 Background -- 2.1 Device Runtime Library -- 2.2 Compilation Flow of OpenMP Target Offloading in LLVM/Clang -- 2.3 Motivation -- 3 Implementation -- 3.1 Common Part -- 3.2 Target Specific Part -- 4 Evaluation -- 4.1 Code Comparison -- 4.2 Functional Testing -- 4.3 Performance Evaluation -- 5 Conclusions and Future Work -- References -- FOTV: A Generic Device Offloading Framework for OpenMP -- 1 Introduction -- 2 Background: OpenMP Offloading Infrastructure -- 2.1 Offloading Strategy -- 2.2 Advantages and Limitations -- 3 Architecture of the FOTV Generic Device Framework -- 3.1 The Runtime Library Components -- 3.2 The Code Extraction Tool -- 4 Device Management API Description -- 4.1 DeviceManagement Component -- 4.2 TgtRegionBase Component -- 5 Case Study: Running OpenCL Kernels as OpenMP Regions -- 5.1 The OpenCL Device Requirements -- 6 Results -- 7 Related Works -- 8 Conclusions and Future Works -- References -- Beyond Explicit Transfers: Shared and Managed Memory in OpenMP -- 1 Introduction -- 2 Current Support in OpenMP -- 2.1 Allocators -- 2.2 Host Memory -- 2.3 Device Memory -- 3 Survey -- 3.1 OpenCL -- 3.2 Level Zero -- 3.3 CUDA -- 3.4 HIP -- 4 Proposed OpenMP Extension -- 4.1 Memory Space Accessibility -- 4.2 Shared and Managed Memory -- 4.3 Memory Location Control -- 5 Evaluation -- 6 Conclusion -- References -- Tasking Extensions II -- Communication-Aware Task Scheduling Strategy in Hybrid MPI+OpenMP Applications -- 1 Introduction -- 2 Related Work.
3 Task Scheduling Strategy -- 3.1 Interoperation Between MPI and OpenMP Runtimes -- 3.2 Manual Policies -- 3.3 (Semi-)Automatic Policies -- 3.4 Summary -- 4 Implementation and Evaluation -- 4.1 Implementation -- 4.2 Evaluation Environment -- 4.3 Experimental Results -- 5 Conclusion and Future Work -- References -- An OpenMP Free Agent Threads Implementation -- 1 Introduction -- 2 Related Work -- 3 Proposal -- 3.1 Considered Aspects in the Design -- 3.2 The free_agent Task Clause -- 3.3 Proposed Mechanisms to Manage Free Agent Threads -- 4 Implementation -- 5 Evaluation -- 5.1 Use Case: Fixing Load Imbalance Between Parallel Regions -- 5.2 Use Case: Solving Load Imbalance in a Hybrid Application with DLB as an OMPT Tool -- 6 Conclusions and Future Work -- References -- Author Index.
Titolo autorizzato: OpenMP  Visualizza cluster
ISBN: 3-030-85262-8
Formato: Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione: Inglese
Record Nr.: 996464509003316
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Serie: Lecture Notes in Computer Science