High performance parallelism pearls : multicore and many-core programming approaches / / James Reinders, Jim Jeffers |
Autore | Reinders James |
Edizione | [First edition.] |
Pubbl/distr/stampa | Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 |
Descrizione fisica | 1 online resource (549 p.) |
Disciplina | 005.275 |
Soggetto topico |
Parallel programming (Computer science) - Data processing
Coprocessors |
Soggetto genere / forma | Electronic books. |
ISBN | 0-12-802199-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Front Cover; High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches; Copyright; Contents; Contributors; Acknowledgments; Foreword; Humongous computing needs: Science years in the making; Open standards; Keen on many-core architecture; Xeon Phi is born: Many cores, excellent vector ISA ; Learn highly scalable parallel programming; Future demands grow: Programming models matter; Preface; Inspired by 61 cores: A new era in programming; Chapter 1: Introduction; Learning from successful experiences; Code modernization; Modernize with concurrent algorithms
Modernize with vectorization and data localityUnderstanding power usage; ISPC and OpenCL anyone?; Intel Xeon Phi coprocessor specific; Many-core, neo-heterogeneous; No "Xeon Phi" in the title, neo-heterogeneous programming; The future of many-core; Downloads; Chapter 2: From "Correct" to "Correct & Efficient": A Hydro2D Case Study with Godunov's Scheme; Scientific computing on contemporary computers; Modern computing environments; CEA's Hydro2D; A numerical method for shock hydrodynamics; Euler's equation; Godunov's method; Where it fits; Features of modern architectures Performance-oriented architectureProgramming tools and runtimes; Our computing environments; Paths to performance; Running Hydro2D; Hydro2D's structure; Computation scheme; Data structures; Measuring performance; Optimizations; Memory usage; Thread-level parallelism; Arithmetic efficiency and instruction-level parallelism; Data-level parallelism; Summary; The coprocessor vs the processor; A rising tide lifts all boats; Performance strategies; Chapter 3: Better Concurrency and SIMD on HBM ; The application: HIROMB - BOOS -Model; Key usage: DMI ; HBM execution profile Overview for the optimization of HBM Data structures: Locality done right; Thread parallelism in HBM ; Data parallelism: SIMD vectorization; Trivial obstacles; Premature abstraction is the root of all evil; Results; Profiling details; Scaling on processor vs. coprocessor; Contiguous attribute; Summary; References; Chapter 4: Optimizing for Reacting Navier-Stokes Equations; Getting started; Version 1.0: Baseline; Version 2.0: ThreadBox ; Version 3.0: Stack memory; Version 4.0: Blocking; Version 5.0: Vectorization; Intel Xeon Phi coprocessor results; Summary Chapter 5: Plesiochronous Phasing BarriersWhat can be done to improve the code?; What more can be done to improve the code?; Hyper-Thread Phalanx; What is nonoptimal about this strategy?; Coding the Hyper-Thread Phalanx; How to determine thread binding to core and HT within core?; The Hyper-Thread Phalanx hand-partitioning technique; A lesson learned; Back to work; Data alignment; Use aligned data when possible; Redundancy can be good for you; The plesiochronous phasing barrier; Let us do something to recover this wasted time; A few "left to the reader" possibilities Xeon host performance improvements similar to Xeon Phi |
Record Nr. | UNINA-9910459676203321 |
Reinders James | ||
Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
High performance parallelism pearls : multicore and many-core programming approaches / / James Reinders, Jim Jeffers |
Autore | Reinders James |
Edizione | [First edition.] |
Pubbl/distr/stampa | Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 |
Descrizione fisica | 1 online resource (549 p.) |
Disciplina | 005.275 |
Soggetto topico |
Parallel programming (Computer science) - Data processing
Coprocessors |
ISBN | 0-12-802199-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Front Cover; High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches; Copyright; Contents; Contributors; Acknowledgments; Foreword; Humongous computing needs: Science years in the making; Open standards; Keen on many-core architecture; Xeon Phi is born: Many cores, excellent vector ISA ; Learn highly scalable parallel programming; Future demands grow: Programming models matter; Preface; Inspired by 61 cores: A new era in programming; Chapter 1: Introduction; Learning from successful experiences; Code modernization; Modernize with concurrent algorithms
Modernize with vectorization and data localityUnderstanding power usage; ISPC and OpenCL anyone?; Intel Xeon Phi coprocessor specific; Many-core, neo-heterogeneous; No "Xeon Phi" in the title, neo-heterogeneous programming; The future of many-core; Downloads; Chapter 2: From "Correct" to "Correct & Efficient": A Hydro2D Case Study with Godunov's Scheme; Scientific computing on contemporary computers; Modern computing environments; CEA's Hydro2D; A numerical method for shock hydrodynamics; Euler's equation; Godunov's method; Where it fits; Features of modern architectures Performance-oriented architectureProgramming tools and runtimes; Our computing environments; Paths to performance; Running Hydro2D; Hydro2D's structure; Computation scheme; Data structures; Measuring performance; Optimizations; Memory usage; Thread-level parallelism; Arithmetic efficiency and instruction-level parallelism; Data-level parallelism; Summary; The coprocessor vs the processor; A rising tide lifts all boats; Performance strategies; Chapter 3: Better Concurrency and SIMD on HBM ; The application: HIROMB - BOOS -Model; Key usage: DMI ; HBM execution profile Overview for the optimization of HBM Data structures: Locality done right; Thread parallelism in HBM ; Data parallelism: SIMD vectorization; Trivial obstacles; Premature abstraction is the root of all evil; Results; Profiling details; Scaling on processor vs. coprocessor; Contiguous attribute; Summary; References; Chapter 4: Optimizing for Reacting Navier-Stokes Equations; Getting started; Version 1.0: Baseline; Version 2.0: ThreadBox ; Version 3.0: Stack memory; Version 4.0: Blocking; Version 5.0: Vectorization; Intel Xeon Phi coprocessor results; Summary Chapter 5: Plesiochronous Phasing BarriersWhat can be done to improve the code?; What more can be done to improve the code?; Hyper-Thread Phalanx; What is nonoptimal about this strategy?; Coding the Hyper-Thread Phalanx; How to determine thread binding to core and HT within core?; The Hyper-Thread Phalanx hand-partitioning technique; A lesson learned; Back to work; Data alignment; Use aligned data when possible; Redundancy can be good for you; The plesiochronous phasing barrier; Let us do something to recover this wasted time; A few "left to the reader" possibilities Xeon host performance improvements similar to Xeon Phi |
Record Nr. | UNINA-9910787136203321 |
Reinders James | ||
Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
High performance parallelism pearls : multicore and many-core programming approaches / / James Reinders, Jim Jeffers |
Autore | Reinders James |
Edizione | [First edition.] |
Pubbl/distr/stampa | Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 |
Descrizione fisica | 1 online resource (549 p.) |
Disciplina | 005.275 |
Soggetto topico |
Parallel programming (Computer science) - Data processing
Coprocessors |
ISBN | 0-12-802199-3 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Front Cover; High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches; Copyright; Contents; Contributors; Acknowledgments; Foreword; Humongous computing needs: Science years in the making; Open standards; Keen on many-core architecture; Xeon Phi is born: Many cores, excellent vector ISA ; Learn highly scalable parallel programming; Future demands grow: Programming models matter; Preface; Inspired by 61 cores: A new era in programming; Chapter 1: Introduction; Learning from successful experiences; Code modernization; Modernize with concurrent algorithms
Modernize with vectorization and data localityUnderstanding power usage; ISPC and OpenCL anyone?; Intel Xeon Phi coprocessor specific; Many-core, neo-heterogeneous; No "Xeon Phi" in the title, neo-heterogeneous programming; The future of many-core; Downloads; Chapter 2: From "Correct" to "Correct & Efficient": A Hydro2D Case Study with Godunov's Scheme; Scientific computing on contemporary computers; Modern computing environments; CEA's Hydro2D; A numerical method for shock hydrodynamics; Euler's equation; Godunov's method; Where it fits; Features of modern architectures Performance-oriented architectureProgramming tools and runtimes; Our computing environments; Paths to performance; Running Hydro2D; Hydro2D's structure; Computation scheme; Data structures; Measuring performance; Optimizations; Memory usage; Thread-level parallelism; Arithmetic efficiency and instruction-level parallelism; Data-level parallelism; Summary; The coprocessor vs the processor; A rising tide lifts all boats; Performance strategies; Chapter 3: Better Concurrency and SIMD on HBM ; The application: HIROMB - BOOS -Model; Key usage: DMI ; HBM execution profile Overview for the optimization of HBM Data structures: Locality done right; Thread parallelism in HBM ; Data parallelism: SIMD vectorization; Trivial obstacles; Premature abstraction is the root of all evil; Results; Profiling details; Scaling on processor vs. coprocessor; Contiguous attribute; Summary; References; Chapter 4: Optimizing for Reacting Navier-Stokes Equations; Getting started; Version 1.0: Baseline; Version 2.0: ThreadBox ; Version 3.0: Stack memory; Version 4.0: Blocking; Version 5.0: Vectorization; Intel Xeon Phi coprocessor results; Summary Chapter 5: Plesiochronous Phasing BarriersWhat can be done to improve the code?; What more can be done to improve the code?; Hyper-Thread Phalanx; What is nonoptimal about this strategy?; Coding the Hyper-Thread Phalanx; How to determine thread binding to core and HT within core?; The Hyper-Thread Phalanx hand-partitioning technique; A lesson learned; Back to work; Data alignment; Use aligned data when possible; Redundancy can be good for you; The plesiochronous phasing barrier; Let us do something to recover this wasted time; A few "left to the reader" possibilities Xeon host performance improvements similar to Xeon Phi |
Record Nr. | UNINA-9910812710103321 |
Reinders James | ||
Waltham, Massachusetts : , : Morgan Kaufmann, , 2015 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|
Intel Xeon Phi coprocessor architecture and tools : the guide for application developers / / Rezaur Rahman |
Autore | Rahman Rezaur |
Edizione | [1st ed. 2013.] |
Pubbl/distr/stampa | Apress, 2013 |
Descrizione fisica | 1 online resource (xxi, 209 pages) : illustrations (some color) |
Disciplina |
004
005.1 |
Collana |
The expert's voice in microprocessors
Gale eBooks |
Soggetto topico |
Coprocessors
Computer programming High performance computing |
ISBN | 1-4302-5927-2 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
""Contents at a Glance""; ""Contents""; ""About the Author""; ""About the Technical Reviewer""; ""Acknowledgments""; ""Introduction""; ""Part1: Hardware Foundation: Intel Xeon Phi Architecture""; ""Chapter 1: Introduction to Xeon Phi Architecture""; ""History of Intel Xeon Phi Development""; ""Evolution from Von Neumann Architecture to Cache Subsystem Architecture""; ""Improvements in the Core and Memory""; ""Instruction-Level Parallelism""; ""Instruction Pipelining""; ""Single Instruction Multiple Data""; ""Multithreading""; ""Multicore and Manycore Architecture""
""Interconnect and Cache Improvements""""System Interconnect""; ""Intel Xeon Phi Coprocessor Chip Architecture""; ""Applicability of the Intel Xeon Phi Coprocessor""; ""Summary""; ""Chapter 2: Programming Xeon Phi""; ""Intel Xeon Phi Execution Models""; ""Development Tools for Intel Xeon Phi Architecture""; ""Intel Composer XE""; ""Getting the Tools""; ""Using the Compilers""; ""Setting Up an Intel Xeon Phi System""; ""Install the MPSS Stack""; ""Install the Development Tools""; ""Code Generation for Intel Xeon Phi Architecture""; ""Native Execution Mode""; ""Hello World Example"" ""Language Extensions to Support Offload Computation on Intel Xeon Phi""""Heterogeneous Computing Model and Offload Pragmas""; ""Language Extensions and Execution Model""; ""Terminology""; ""Offload Function and Data Declaration Directives""; ""declare target Directives""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Function Offload and Execution Constructs""; ""Target Data Directive""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Target Directive""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Target Update Directive""; ""Syntax""; ""C/C++""; ""Fortran"" ""Runtime Library Routines""""Offload Example""; ""Summary""; ""Chapter 3: Xeon Phi Vector Architecture and Instruction Set""; ""Xeon Phi Vector Microarchitecture""; ""The VPU Pipeline""; ""VPU Instruction Stalls""; ""Pairing Rule""; ""Vector Registers""; ""Vector Mask Registers""; ""Extended Math Unit""; ""Xeon Phi Vector Instruction Set Architecture""; ""Data Types""; ""Vector Nomenclature""; ""Vector Instruction Syntax""; ""Xeon Phi Vector ISA by Categories""; ""Mask Operations""; ""Swizzle, Shuffle, Broadcast, and Convert Instructions""; ""Swizzle""; ""Register Memory Swizzle"" ""Data Broadcasts""""Data Conversions""; ""Shuffles""; ""Shift Operation""; ""Logical Shifts""; ""Arithmetic Shifts""; ""Sample Code for Swizzle and Shuffle Instructions""; ""Arithmetic and Logic Operations""; ""Fused Multiply-Add""; ""Data Access Operations (Load, Store, Prefetch, and Gather/Scatter)""; ""Memory Alignment""; ""Pack/ U npack""; ""Non-temporal data""; ""Streaming Stores""; ""Scatter/Gather""; ""Prefetch Instructions""; ""Summary""; ""Chapter 4: Xeon Phi Core Microarchitecture""; ""Intel Xeon Phi Cores""; ""Core Pipeline Stages""; ""Cache and TLB Structure"" ""L2 Cache Structure"" |
Record Nr. | UNISA-996198793103316 |
Rahman Rezaur | ||
Apress, 2013 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. di Salerno | ||
|
Intel Xeon Phi coprocessor architecture and tools : the guide for application developers / / Rezaur Rahman |
Autore | Rahman Rezaur |
Edizione | [1st ed. 2013.] |
Pubbl/distr/stampa | Apress, 2013 |
Descrizione fisica | 1 online resource (xxi, 209 pages) : illustrations (some color) |
Disciplina |
004
005.1 |
Collana |
The expert's voice in microprocessors
Gale eBooks |
Soggetto topico |
Coprocessors
Computer programming High performance computing |
ISBN | 1-4302-5927-2 |
Formato | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
""Contents at a Glance""; ""Contents""; ""About the Author""; ""About the Technical Reviewer""; ""Acknowledgments""; ""Introduction""; ""Part1: Hardware Foundation: Intel Xeon Phi Architecture""; ""Chapter 1: Introduction to Xeon Phi Architecture""; ""History of Intel Xeon Phi Development""; ""Evolution from Von Neumann Architecture to Cache Subsystem Architecture""; ""Improvements in the Core and Memory""; ""Instruction-Level Parallelism""; ""Instruction Pipelining""; ""Single Instruction Multiple Data""; ""Multithreading""; ""Multicore and Manycore Architecture""
""Interconnect and Cache Improvements""""System Interconnect""; ""Intel Xeon Phi Coprocessor Chip Architecture""; ""Applicability of the Intel Xeon Phi Coprocessor""; ""Summary""; ""Chapter 2: Programming Xeon Phi""; ""Intel Xeon Phi Execution Models""; ""Development Tools for Intel Xeon Phi Architecture""; ""Intel Composer XE""; ""Getting the Tools""; ""Using the Compilers""; ""Setting Up an Intel Xeon Phi System""; ""Install the MPSS Stack""; ""Install the Development Tools""; ""Code Generation for Intel Xeon Phi Architecture""; ""Native Execution Mode""; ""Hello World Example"" ""Language Extensions to Support Offload Computation on Intel Xeon Phi""""Heterogeneous Computing Model and Offload Pragmas""; ""Language Extensions and Execution Model""; ""Terminology""; ""Offload Function and Data Declaration Directives""; ""declare target Directives""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Function Offload and Execution Constructs""; ""Target Data Directive""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Target Directive""; ""Syntax""; ""C/C++""; ""Fortran""; ""Restrictions""; ""Target Update Directive""; ""Syntax""; ""C/C++""; ""Fortran"" ""Runtime Library Routines""""Offload Example""; ""Summary""; ""Chapter 3: Xeon Phi Vector Architecture and Instruction Set""; ""Xeon Phi Vector Microarchitecture""; ""The VPU Pipeline""; ""VPU Instruction Stalls""; ""Pairing Rule""; ""Vector Registers""; ""Vector Mask Registers""; ""Extended Math Unit""; ""Xeon Phi Vector Instruction Set Architecture""; ""Data Types""; ""Vector Nomenclature""; ""Vector Instruction Syntax""; ""Xeon Phi Vector ISA by Categories""; ""Mask Operations""; ""Swizzle, Shuffle, Broadcast, and Convert Instructions""; ""Swizzle""; ""Register Memory Swizzle"" ""Data Broadcasts""""Data Conversions""; ""Shuffles""; ""Shift Operation""; ""Logical Shifts""; ""Arithmetic Shifts""; ""Sample Code for Swizzle and Shuffle Instructions""; ""Arithmetic and Logic Operations""; ""Fused Multiply-Add""; ""Data Access Operations (Load, Store, Prefetch, and Gather/Scatter)""; ""Memory Alignment""; ""Pack/ U npack""; ""Non-temporal data""; ""Streaming Stores""; ""Scatter/Gather""; ""Prefetch Instructions""; ""Summary""; ""Chapter 4: Xeon Phi Core Microarchitecture""; ""Intel Xeon Phi Cores""; ""Core Pipeline Stages""; ""Cache and TLB Structure"" ""L2 Cache Structure"" |
Record Nr. | UNINA-9910293151803321 |
Rahman Rezaur | ||
Apress, 2013 | ||
Materiale a stampa | ||
Lo trovi qui: Univ. Federico II | ||
|