top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Advancing OpenMP for Future Accelerators : 20th International Workshop on OpenMP, IWOMP 2024, Perth, WA, Australia, September 23–25, 2024, Proceedings / / edited by Alexis Espinosa, Michael Klemm, Bronis R. de Supinski, Maciej Cytowski, Jannis Klinkenberg
Advancing OpenMP for Future Accelerators : 20th International Workshop on OpenMP, IWOMP 2024, Perth, WA, Australia, September 23–25, 2024, Proceedings / / edited by Alexis Espinosa, Michael Klemm, Bronis R. de Supinski, Maciej Cytowski, Jannis Klinkenberg
Autore Espinosa Alexis
Edizione [1st ed. 2024.]
Pubbl/distr/stampa Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2024
Descrizione fisica 1 online resource (230 pages)
Disciplina 005.45
Altri autori (Persone) KlemmMichael
de SupinskiBronis R
CytowskiMaciej
KlinkenbergJannis
Collana Lecture Notes in Computer Science
Soggetto topico Compilers (Computer programs)
Microprogramming
Computer input-output equipment
Computers, Special purpose
Computer systems
Compilers and Interpreters
Control Structures and Microprogramming
Input/Output and Data Communications
Special Purpose and Application-Based Systems
Computer System Implementation
ISBN 3-031-72567-0
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto -- Current and Future OpenMP Optimization. -- Towards Locality-Aware Host-to-Device Offloading in OpenMP. -- Performance Porting the ExaStar Multi-physics App Thornado On Heterogeneous Systems - A Fortran-OpenMP Code-base Evaluation. -- Event-Based OpenMP Tasks for Time-Sensitive GPU-Accelerated Systems. -- Targeting More Devices. -- Integrating Multi-FPGA Acceleration to OpenMP Distributed Computing. -- Towards a Scalable and Efficient PGAS-based Distributed OpenMP. -- Multilayer Multipurpose Caches for OpenMP Target Regions on FPGAs. -- Best Practices. -- Survey of OpenMP Practice in General Open Source Software. -- CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations. -- Evaluation of Directive-based Programming Models for Stencil Computation on Current GPGPU Architectures. -- Tools. -- Finding Equivalent OpenMP Fortran and C/C++ Code Snippets Using Large Language Models. -- Visualizing Correctness Issues in OpenMP Programs. -- Developing an Interactive OpenMP Programming Book with Large Language Models. -- Simplifying Parallelization. -- Automatic Parallelization and OpenMP Offloading of Fortran Array Notation. -- Detrimental Task Execution Patterns in Mainstream OpenMP Runtimes.
Record Nr. UNINA-9910888598803321
Espinosa Alexis  
Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2024
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
OpenMP in a modern world : from multi-device support to meta programming : 18th International Workshop on OpenMP, IWOMP 2022, Chattanooga, TN, USA, September 27-30, 2022, proceedings / / edited by Michael Klemm, [and three others]
OpenMP in a modern world : from multi-device support to meta programming : 18th International Workshop on OpenMP, IWOMP 2022, Chattanooga, TN, USA, September 27-30, 2022, proceedings / / edited by Michael Klemm, [and three others]
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (178 pages)
Disciplina 410
Collana Lecture Notes in Computer Science
Soggetto topico Parallel programming (Computer science)
ISBN 3-031-15922-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996490360403316
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
OpenMP in a modern world : from multi-device support to meta programming : 18th International Workshop on OpenMP, IWOMP 2022, Chattanooga, TN, USA, September 27-30, 2022, proceedings / / edited by Michael Klemm, [and three others]
OpenMP in a modern world : from multi-device support to meta programming : 18th International Workshop on OpenMP, IWOMP 2022, Chattanooga, TN, USA, September 27-30, 2022, proceedings / / edited by Michael Klemm, [and three others]
Pubbl/distr/stampa Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica 1 online resource (178 pages)
Disciplina 410
Collana Lecture Notes in Computer Science
Soggetto topico Parallel programming (Computer science)
ISBN 3-031-15922-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNINA-9910595035003321
Cham, Switzerland : , : Springer, , [2022]
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
OpenMP: Advanced Task-Based, Device and Compiler Programming [[electronic resource] ] : 19th International Workshop on OpenMP, IWOMP 2023, Bristol, UK, September 13–15, 2023, Proceedings / / edited by Simon McIntosh-Smith, Michael Klemm, Bronis R. de Supinski, Tom Deakin, Jannis Klinkenberg
OpenMP: Advanced Task-Based, Device and Compiler Programming [[electronic resource] ] : 19th International Workshop on OpenMP, IWOMP 2023, Bristol, UK, September 13–15, 2023, Proceedings / / edited by Simon McIntosh-Smith, Michael Klemm, Bronis R. de Supinski, Tom Deakin, Jannis Klinkenberg
Autore McIntosh-Smith Simon
Edizione [1st ed. 2023.]
Pubbl/distr/stampa Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2023
Descrizione fisica 1 online resource (244 pages)
Disciplina 005.275
Altri autori (Persone) KlemmMichael
de SupinskiBronis R
DeakinTom
KlinkenbergJannis
Collana Lecture Notes in Computer Science
Soggetto topico Microprocessors
Computer architecture
Compilers (Computer programs)
Microprogramming
Computer input-output equipment
Computers, Special purpose
Computer systems
Processor Architectures
Compilers and Interpreters
Control Structures and Microprogramming
Input/Output and Data Communications
Special Purpose and Application-Based Systems
Computer System Implementation
ISBN 3-031-40744-X
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto OpenMP and AI: Advising OpenMP Parallelization via a Graph-Based Approach with Transformers -- Towards Effective Language Model Application in High-Performance Computing -- OpenMP Advisor: A Compiler Tool for Heterogeneous Architectures -- Tasking Extensions: Introducing Moldable Task in OpenMP -- Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct -- How to Efficiently Parallelize Irregular DOACROSS Loops Using Fine-Grained Granularity and OpenMP Tasks? The mcf Case -- OpenMP Offload Experiences: The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned -- Fine-Grained Parallelism on GPUs Using OpenMP Target Offloading -- Improving a Multigrid Poisson Solver with Peer-to-Peer Communication and Task Dependencies -- Beyond Explicit GPU Support: Multipurpose Cacheing to accelerate OpenMP Target Regions on FPGAs -- Generalizing Hierarchical Parallelism -- Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) Offload -- OpenMP Infrastructure and Evaluation: Improving Simulations of Task-Based Applications on Complex NUMA Architectures -- Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support -- OpenMP Reverse Offloading Using Shared Memory Remote Procedure Calls.
Record Nr. UNISA-996546849103316
McIntosh-Smith Simon  
Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2023
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
OpenMP: Advanced Task-Based, Device and Compiler Programming : 19th International Workshop on OpenMP, IWOMP 2023, Bristol, UK, September 13–15, 2023, Proceedings / / edited by Simon McIntosh-Smith, Michael Klemm, Bronis R. de Supinski, Tom Deakin, Jannis Klinkenberg
OpenMP: Advanced Task-Based, Device and Compiler Programming : 19th International Workshop on OpenMP, IWOMP 2023, Bristol, UK, September 13–15, 2023, Proceedings / / edited by Simon McIntosh-Smith, Michael Klemm, Bronis R. de Supinski, Tom Deakin, Jannis Klinkenberg
Autore McIntosh-Smith Simon
Edizione [1st ed. 2023.]
Pubbl/distr/stampa Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2023
Descrizione fisica 1 online resource (244 pages)
Disciplina 005.275
Altri autori (Persone) KlemmMichael
de SupinskiBronis R
DeakinTom
KlinkenbergJannis
Collana Lecture Notes in Computer Science
Soggetto topico Microprocessors
Computer architecture
Compilers (Computer programs)
Microprogramming
Computer input-output equipment
Computers, Special purpose
Computer systems
Processor Architectures
Compilers and Interpreters
Control Structures and Microprogramming
Input/Output and Data Communications
Special Purpose and Application-Based Systems
Computer System Implementation
ISBN 3-031-40744-X
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto OpenMP and AI: Advising OpenMP Parallelization via a Graph-Based Approach with Transformers -- Towards Effective Language Model Application in High-Performance Computing -- OpenMP Advisor: A Compiler Tool for Heterogeneous Architectures -- Tasking Extensions: Introducing Moldable Task in OpenMP -- Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct -- How to Efficiently Parallelize Irregular DOACROSS Loops Using Fine-Grained Granularity and OpenMP Tasks? The mcf Case -- OpenMP Offload Experiences: The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned -- Fine-Grained Parallelism on GPUs Using OpenMP Target Offloading -- Improving a Multigrid Poisson Solver with Peer-to-Peer Communication and Task Dependencies -- Beyond Explicit GPU Support: Multipurpose Cacheing to accelerate OpenMP Target Regions on FPGAs -- Generalizing Hierarchical Parallelism -- Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) Offload -- OpenMP Infrastructure and Evaluation: Improving Simulations of Task-Based Applications on Complex NUMA Architectures -- Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support -- OpenMP Reverse Offloading Using Shared Memory Remote Procedure Calls.
Record Nr. UNINA-9910743696103321
McIntosh-Smith Simon  
Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2023
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Optimizing HPC Applications with Intel Cluster Tools [[electronic resource] ] : Hunting Petaflops / / by Alexander Supalov, Andrey Semin, Christopher Dahnken, Michael Klemm
Optimizing HPC Applications with Intel Cluster Tools [[electronic resource] ] : Hunting Petaflops / / by Alexander Supalov, Andrey Semin, Christopher Dahnken, Michael Klemm
Autore Supalov Alexander
Edizione [1st ed. 2014.]
Pubbl/distr/stampa Springer Nature, 2014
Descrizione fisica 1 online resource (291 pages) : illustrations
Disciplina 004.11
Collana The expert's voice in software engineering
Soggetto topico Programming languages (Electronic computers)
Software engineering
Programming Languages, Compilers, Interpreters
Software Engineering/Programming and Operating Systems
Soggetto non controllato Computer science
ISBN 1-4302-6497-7
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Contents at a Glance -- Contents -- About the Authors -- About the Technical Reviewers -- Acknowledgments -- Foreword -- Introduction -- Chapter 1: No Time to Read This Book? -- Using Intel MPI Library -- Using Intel Composer XE -- Tuning Intel MPI Library -- Gather Built-in Statistics -- Optimize Process Placement -- Optimize Thread Placement -- Tuning Intel Composer XE -- Analyze Optimization and Vectorization Reports -- Use Interprocedural Optimization -- Summary -- References -- Chapter 2: Overview of Platform Architectures -- Performance Metrics and Targets -- Latency, Throughput, Energy, and Power -- Peak Performance as the Ultimate Limit -- Scalability and Maximum Parallel Speedup -- Bottlenecks and a Bit of Queuing Theory -- Roofline Model -- Performance Features of Computer Architectures -- Increasing Single-Threaded Performance: Where You Can and Cannot Help -- Process More Data with SIMD Parallelism -- Distributed and Shared Memory Systems -- Use More Independent Threads on the Same Node -- Don't Limit Yourself to a Single Server -- HPC Hardware Architecture Overview -- A Multicore Workstation or a Server Compute Node -- Coprocessor for Highly Parallel Applications -- Group of Similar Nodes Form an HPC Cluster -- Other Important Components of HPC Systems -- Summary -- References -- Chapter 3: Top-Down Software Optimization -- The Three Levels and Their Impact on Performance -- System Level -- Application Level -- Working Against the Memory Wall -- The Magic of Vectors -- Distributed Memory Parallelization -- Shared Memory Parallelization -- Other Existing Approaches and Methods -- Microarchitecture Level -- Addressing Pipelines and Execution -- Closed-Loop Methodology -- Workload, Application, and Baseline -- Iterating the Optimization Process -- Summary -- References -- Chapter 4: Addressing System Bottlenecks.
Classifying System-Level Bottlenecks -- Identifying Issues Related to System Condition -- Characterizing Problems Caused by System Configuration -- Understanding System-Level Performance Limits -- Checking General Compute Subsystem Performance -- Testing Memory Subsystem Performance -- Testing I/O Subsystem Performance -- Characterizing Application System-Level Issues -- Selecting Performance Characterization Tools -- Monitoring the I/O Utilization -- Analyzing Memory Bandwidth -- Summary -- References -- Chapter 5: Addressing Application Bottlenecks: Distributed Memory -- Algorithm for Optimizing MPI Performance -- Comprehending the Underlying MPI Performance -- Recalling Some Benchmarking Basics -- Gauging Default Intranode Communication Performance -- Gauging Default Internode Communication Performance -- Discovering Default Process Layout and Pinning Details -- Gauging Physical Core Performance -- Doing Initial Performance Analysis -- Is It Worth the Trouble? -- Example 1: Initial HPL Performance Investigation -- Getting an Overview of Scalability and Performance -- Learning Application Behavior -- Example 2: MiniFE Performance Investigation -- Choosing Representative Workload(s) -- Example 2 (cont.): MiniFE Performance Investigation -- Balancing Process and Thread Parallelism -- Example 2 (cont.): MiniFE Performance Investigation -- Doing a Scalability Review -- Example 2 (cont.): MiniFE Performance Investigation -- Analyzing the Details of the Application Behavior -- Example 2 (cont.): MiniFE Performance Investigation -- Choosing the Optimization Objective -- Detecting Load Imbalance -- Example 2 (cont.): MiniFE Performance Investigation -- Dealing with Load Imbalance -- Classifying Load Imbalance -- Addressing Load Imbalance -- Example 2 (cont.): MiniFE Performance Investigation -- Example 3: MiniMD Performance Investigation.
Optimizing MPI Performance -- Classifying the MPI Performance Issues -- Addressing MPI Performance Issues -- Mapping Application onto the Platform -- Understanding Communication Paths -- Selecting Proper Communication Fabrics -- Using Scalable Datagrams -- Specifying a Network Provider -- Using IP over IB -- Controlling the Fabric Fallback Mechanism -- Using Multirail Capabilities -- Detecting and Classifying Improper Process Layout and Pinning Issues -- Controlling Process Layout -- Controlling the Global Process Layout -- Controlling the Detailed Process Layout -- Setting the Environment Variables at All Levels -- Controlling the Process Pinning -- Controlling Memory and Network Affinity -- Example 4: MiniMD Performance Investigation on Xeon Phi -- Example 5: MiniGhost Performance Investigation -- Tuning the Intel MPI Library -- Tuning Intel MPI for the Platform -- Tuning Point-to-Point Settings -- Adjusting the Eager and Rendezvous Protocol Thresholds -- Changing DAPL and DAPL UD Eager Protocol Threshold -- Bypassing Shared Memory for Intranode Communication -- Bypassing the Cache for Intranode Communication -- Choosing the Best Collective Algorithms -- Tuning Intel MPI Library for the Application -- Using Magical Tips and Tricks -- Disabling the Dynamic Connection Mode -- Applying the Wait Mode to Oversubscribed Jobs -- Fine-Tuning the Message-Passing Progress Engine -- Reducing the Pre-reserved DAPL Memory Size -- What Else? -- Example 5 (cont.): MiniGhost Performance Investigation -- Optimizing Application for Intel MPI -- Avoiding MPI_ANY_SOURCE -- Avoiding Superfluous Synchronization -- Using Derived Datatypes -- Using Collective Operations -- Betting on the Computation/Communication Overlap -- Replacing Blocking Collective Operations by MPI-3 Nonblocking Ones -- Using Accelerated MPI File I/O.
Example 5 (cont.): MiniGhost Performance Investigation -- Using Advanced Analysis Techniques -- Automatically Checking MPI Program Correctness -- Comparing Application Traces -- Instrumenting Application Code -- Correlating MPI and Hardware Events -- Collecting and Analyzing Hardware Counter Information in ITAC -- Collecting and Analyzing Hardware Counter Information in VTune -- Summary -- References -- Chapter 6: Addressing Application Bottlenecks: Shared Memory -- Profiling Your Application -- Using VTune Amplifier XE for Hotspots Profiling -- Hotspots for the HPCG Benchmark -- Compiler-Assisted Loop/Function Profiling -- Sequential Code and Detecting Load Imbalances -- Thread Synchronization and Locking -- Dealing with Memory Locality and NUMA Effects -- Thread and Process Pinning -- Controlling OpenMP Thread Placement -- Thread Placement in Hybrid Applications -- Summary -- References -- Chapter 7: Addressing Application Bottlenecks: Microarchitecture -- Overview of a Modern Processor Pipeline -- Pipelined Execution -- Data Conflicts -- Control Conflicts -- Structural Conflicts -- Out-of-order vs. In-order Execution -- Superscalar Pipelines -- SIMD Execution -- Speculative Execution: Branch Prediction -- Memory Subsystem -- Putting It All Together: A Final Look at the Sandy Bridge Pipeline -- A Top-down Method for Categorizing the Pipeline Performance -- Intel Composer XE Usage for Microarchitecture Optimizations -- Basic Compiler Usage and Optimization -- Using Optimization and Vectorization Reports to Read the Compiler's Mind -- Optimizing for Vectorization -- The AVX Instruction Set -- Why Doesn't My Code Vectorize in the First Place? -- Data Dependences -- Data Aliasing -- Array Notations -- Vectorization Directives -- ivdep -- vector -- simd -- Understanding AVX: Intrinsic Programming -- What Are Intrinsics?.
First Steps: Loading and Storing -- Arithmetic -- Data Rearrangement -- Dealing with Disambiguation -- Dealing with Branches -- __builtin_expect -- Profile-Guided Optimization -- Pragmas for Unrolling Loops and Inlining -- unroll/nounroll -- unroll_and_jam/nounroll_and_jam -- inline, noinline, forceinline -- Specialized Routines: How to Exploit the Branch Prediction for Maximal Performance -- When Optimization Leads to Wrong Results -- Using a Standard Library Method -- Using a Manual Implementation in C -- Vectorization with Directives -- Analyzing Pipeline Performance with Intel VTune Amplifier XE -- Summary -- References -- Chapter 8: Application Design Considerations -- Abstraction and Generalization of the Platform Architecture -- Types of Abstractions -- Levels of Abstraction and Complexities -- Raw Hardware vs. Virtualized Hardware in the Cloud -- Questions about Application Design -- Designing for Performance and Scaling -- Designing for Flexibility and Performance Portability -- Data Layout -- Structured Approach to Express Parallelism -- Understanding Bounds and Projecting Bottlenecks -- Data Storage or Transfer vs. Recalculation -- Total Productivity Assessment -- Summary -- References -- Index.
Record Nr. UNISA-996213652203316
Supalov Alexander  
Springer Nature, 2014
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Optimizing HPC Applications with Intel Cluster Tools [[electronic resource] ] : Hunting Petaflops / / by Alexander Supalov, Andrey Semin, Christopher Dahnken, Michael Klemm
Optimizing HPC Applications with Intel Cluster Tools [[electronic resource] ] : Hunting Petaflops / / by Alexander Supalov, Andrey Semin, Christopher Dahnken, Michael Klemm
Autore Supalov Alexander
Edizione [1st ed. 2014.]
Pubbl/distr/stampa Springer Nature, 2014
Descrizione fisica 1 online resource (291 pages) : illustrations
Disciplina 004.11
Collana The expert's voice in software engineering
Soggetto topico Programming languages (Electronic computers)
Software engineering
Programming Languages, Compilers, Interpreters
Software Engineering/Programming and Operating Systems
Soggetto non controllato Computer science
ISBN 1-4302-6497-7
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Contents at a Glance -- Contents -- About the Authors -- About the Technical Reviewers -- Acknowledgments -- Foreword -- Introduction -- Chapter 1: No Time to Read This Book? -- Using Intel MPI Library -- Using Intel Composer XE -- Tuning Intel MPI Library -- Gather Built-in Statistics -- Optimize Process Placement -- Optimize Thread Placement -- Tuning Intel Composer XE -- Analyze Optimization and Vectorization Reports -- Use Interprocedural Optimization -- Summary -- References -- Chapter 2: Overview of Platform Architectures -- Performance Metrics and Targets -- Latency, Throughput, Energy, and Power -- Peak Performance as the Ultimate Limit -- Scalability and Maximum Parallel Speedup -- Bottlenecks and a Bit of Queuing Theory -- Roofline Model -- Performance Features of Computer Architectures -- Increasing Single-Threaded Performance: Where You Can and Cannot Help -- Process More Data with SIMD Parallelism -- Distributed and Shared Memory Systems -- Use More Independent Threads on the Same Node -- Don't Limit Yourself to a Single Server -- HPC Hardware Architecture Overview -- A Multicore Workstation or a Server Compute Node -- Coprocessor for Highly Parallel Applications -- Group of Similar Nodes Form an HPC Cluster -- Other Important Components of HPC Systems -- Summary -- References -- Chapter 3: Top-Down Software Optimization -- The Three Levels and Their Impact on Performance -- System Level -- Application Level -- Working Against the Memory Wall -- The Magic of Vectors -- Distributed Memory Parallelization -- Shared Memory Parallelization -- Other Existing Approaches and Methods -- Microarchitecture Level -- Addressing Pipelines and Execution -- Closed-Loop Methodology -- Workload, Application, and Baseline -- Iterating the Optimization Process -- Summary -- References -- Chapter 4: Addressing System Bottlenecks.
Classifying System-Level Bottlenecks -- Identifying Issues Related to System Condition -- Characterizing Problems Caused by System Configuration -- Understanding System-Level Performance Limits -- Checking General Compute Subsystem Performance -- Testing Memory Subsystem Performance -- Testing I/O Subsystem Performance -- Characterizing Application System-Level Issues -- Selecting Performance Characterization Tools -- Monitoring the I/O Utilization -- Analyzing Memory Bandwidth -- Summary -- References -- Chapter 5: Addressing Application Bottlenecks: Distributed Memory -- Algorithm for Optimizing MPI Performance -- Comprehending the Underlying MPI Performance -- Recalling Some Benchmarking Basics -- Gauging Default Intranode Communication Performance -- Gauging Default Internode Communication Performance -- Discovering Default Process Layout and Pinning Details -- Gauging Physical Core Performance -- Doing Initial Performance Analysis -- Is It Worth the Trouble? -- Example 1: Initial HPL Performance Investigation -- Getting an Overview of Scalability and Performance -- Learning Application Behavior -- Example 2: MiniFE Performance Investigation -- Choosing Representative Workload(s) -- Example 2 (cont.): MiniFE Performance Investigation -- Balancing Process and Thread Parallelism -- Example 2 (cont.): MiniFE Performance Investigation -- Doing a Scalability Review -- Example 2 (cont.): MiniFE Performance Investigation -- Analyzing the Details of the Application Behavior -- Example 2 (cont.): MiniFE Performance Investigation -- Choosing the Optimization Objective -- Detecting Load Imbalance -- Example 2 (cont.): MiniFE Performance Investigation -- Dealing with Load Imbalance -- Classifying Load Imbalance -- Addressing Load Imbalance -- Example 2 (cont.): MiniFE Performance Investigation -- Example 3: MiniMD Performance Investigation.
Optimizing MPI Performance -- Classifying the MPI Performance Issues -- Addressing MPI Performance Issues -- Mapping Application onto the Platform -- Understanding Communication Paths -- Selecting Proper Communication Fabrics -- Using Scalable Datagrams -- Specifying a Network Provider -- Using IP over IB -- Controlling the Fabric Fallback Mechanism -- Using Multirail Capabilities -- Detecting and Classifying Improper Process Layout and Pinning Issues -- Controlling Process Layout -- Controlling the Global Process Layout -- Controlling the Detailed Process Layout -- Setting the Environment Variables at All Levels -- Controlling the Process Pinning -- Controlling Memory and Network Affinity -- Example 4: MiniMD Performance Investigation on Xeon Phi -- Example 5: MiniGhost Performance Investigation -- Tuning the Intel MPI Library -- Tuning Intel MPI for the Platform -- Tuning Point-to-Point Settings -- Adjusting the Eager and Rendezvous Protocol Thresholds -- Changing DAPL and DAPL UD Eager Protocol Threshold -- Bypassing Shared Memory for Intranode Communication -- Bypassing the Cache for Intranode Communication -- Choosing the Best Collective Algorithms -- Tuning Intel MPI Library for the Application -- Using Magical Tips and Tricks -- Disabling the Dynamic Connection Mode -- Applying the Wait Mode to Oversubscribed Jobs -- Fine-Tuning the Message-Passing Progress Engine -- Reducing the Pre-reserved DAPL Memory Size -- What Else? -- Example 5 (cont.): MiniGhost Performance Investigation -- Optimizing Application for Intel MPI -- Avoiding MPI_ANY_SOURCE -- Avoiding Superfluous Synchronization -- Using Derived Datatypes -- Using Collective Operations -- Betting on the Computation/Communication Overlap -- Replacing Blocking Collective Operations by MPI-3 Nonblocking Ones -- Using Accelerated MPI File I/O.
Example 5 (cont.): MiniGhost Performance Investigation -- Using Advanced Analysis Techniques -- Automatically Checking MPI Program Correctness -- Comparing Application Traces -- Instrumenting Application Code -- Correlating MPI and Hardware Events -- Collecting and Analyzing Hardware Counter Information in ITAC -- Collecting and Analyzing Hardware Counter Information in VTune -- Summary -- References -- Chapter 6: Addressing Application Bottlenecks: Shared Memory -- Profiling Your Application -- Using VTune Amplifier XE for Hotspots Profiling -- Hotspots for the HPCG Benchmark -- Compiler-Assisted Loop/Function Profiling -- Sequential Code and Detecting Load Imbalances -- Thread Synchronization and Locking -- Dealing with Memory Locality and NUMA Effects -- Thread and Process Pinning -- Controlling OpenMP Thread Placement -- Thread Placement in Hybrid Applications -- Summary -- References -- Chapter 7: Addressing Application Bottlenecks: Microarchitecture -- Overview of a Modern Processor Pipeline -- Pipelined Execution -- Data Conflicts -- Control Conflicts -- Structural Conflicts -- Out-of-order vs. In-order Execution -- Superscalar Pipelines -- SIMD Execution -- Speculative Execution: Branch Prediction -- Memory Subsystem -- Putting It All Together: A Final Look at the Sandy Bridge Pipeline -- A Top-down Method for Categorizing the Pipeline Performance -- Intel Composer XE Usage for Microarchitecture Optimizations -- Basic Compiler Usage and Optimization -- Using Optimization and Vectorization Reports to Read the Compiler's Mind -- Optimizing for Vectorization -- The AVX Instruction Set -- Why Doesn't My Code Vectorize in the First Place? -- Data Dependences -- Data Aliasing -- Array Notations -- Vectorization Directives -- ivdep -- vector -- simd -- Understanding AVX: Intrinsic Programming -- What Are Intrinsics?.
First Steps: Loading and Storing -- Arithmetic -- Data Rearrangement -- Dealing with Disambiguation -- Dealing with Branches -- __builtin_expect -- Profile-Guided Optimization -- Pragmas for Unrolling Loops and Inlining -- unroll/nounroll -- unroll_and_jam/nounroll_and_jam -- inline, noinline, forceinline -- Specialized Routines: How to Exploit the Branch Prediction for Maximal Performance -- When Optimization Leads to Wrong Results -- Using a Standard Library Method -- Using a Manual Implementation in C -- Vectorization with Directives -- Analyzing Pipeline Performance with Intel VTune Amplifier XE -- Summary -- References -- Chapter 8: Application Design Considerations -- Abstraction and Generalization of the Platform Architecture -- Types of Abstractions -- Levels of Abstraction and Complexities -- Raw Hardware vs. Virtualized Hardware in the Cloud -- Questions about Application Design -- Designing for Performance and Scaling -- Designing for Flexibility and Performance Portability -- Data Layout -- Structured Approach to Express Parallelism -- Understanding Bounds and Projecting Bottlenecks -- Data Storage or Transfer vs. Recalculation -- Total Productivity Assessment -- Summary -- References -- Index.
Record Nr. UNINA-9910293150303321
Supalov Alexander  
Springer Nature, 2014
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Öffentliche Wörter : Analysen zum öffentlich-medialen Sprachgebrauch
Öffentliche Wörter : Analysen zum öffentlich-medialen Sprachgebrauch
Autore Diekmannshenke Hajo
Pubbl/distr/stampa Berlin : , : Ibidem Verlag, , 2013
Descrizione fisica 1 online resource (169 pages)
Altri autori (Persone) NiehrThomas
GirnthHeiko
MichelSascha
WengelerMartin
PollmannKornelia
WeidacherGeorg
BockBettina
Reissen-KoschJana
KlemmMichael
Collana Perspektiven Germanistischer Linguistik
ISBN 3-8382-6466-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione ger
Nota di contenuto Intro -- HAJO DIEKMANNSHENKE/THOMAS NIEHR Öffentliche Wörter -- MARTIN WENGELER Unwörter. Eine medienwirksame Kategorie zwischen linguistisch begründeter und populärer Sprachkritik -- KORNELIA POLLMANN Von aufmüpfig über Wutbürger zum Stresstest. Eine kritische Be-trachtung des populären Rankings der Wörter des Jahres -- GEORG WEIDACHER Zur Verwendung des komplexen Lexems Unschuldsvermutung im politischen Diskurs Österreichs -- BETTINA BOCK Verschwundene Wörter? Begriffe des DDR-Sozialismus und des Geheimwortschatzes der Staatssicherheit nach 1989/90 -- JANA REISSEN-KOSCH Wörter und Werte - Wie die rechtsextreme Szene im Netz um Zustimmung wirbt -- MICHAEL KLEMM/SASCHA MICHEL Der Bürger hat das Wort. Politiker im Spiegel von Userkommen-taren in Twitter und Facebook -- MARK DANG-ANH/JESSICA EINSPÄNNER/CAJA THIMM Kontextualisierung durch Hashtags. Die Mediatisierung des politischen Sprachgebrauchs im Internet -- Autorenverzeichnis.
Record Nr. UNINA-9910861053903321
Diekmannshenke Hajo  
Berlin : , : Ibidem Verlag, , 2013
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui