LEADER 05506nam 22006973u 450 001 9910458434103321 005 20210107184856.0 010 $a1-118-73927-2 035 $a(CKB)2550000001349255 035 $a(EBL)1776323 035 $a(SSID)ssj0001414088 035 $a(PQKBManifestationID)11777378 035 $a(PQKBTitleCode)TC0001414088 035 $a(PQKBWorkID)11431679 035 $a(PQKB)11207272 035 $a(MiAaPQ)EBC1776323 035 $a(EXLCZ)992550000001349255 100 $a20140908d2014|||| u|| | 101 0 $aeng 135 $aur|n|---||||| 181 $ctxt 182 $cc 183 $acr 200 10$aProfessional CUDA C Programming$b[electronic resource] 210 $aHoboken $cWiley$d2014 215 $a1 online resource (527 p.) 300 $aDescription based upon print version of record. 311 $a1-118-73932-9 311 $a1-322-09490-X 327 $aCover; Title Page; Copyright; Contents; Chapter 1 Heterogeneous Parallel Computing with CUDA; Parallel Computing; Sequential and Parallel Programming; Parallelism; Computer Architecture; Heterogeneous Computing; Heterogeneous Architecture; Paradigm of Heterogeneous Computing; CUDA: A Platform for Heterogeneous Computing; Hello World from GPU; Is CUDA C Programming Difficult?; Summary; Chapter 2 CUDA Programming Model; Introducing the CUDA Programming Model; CUDA Programming Structure; Managing Memory; Organizing Threads; Launching a CUDA Kernel; Writing Your Kernel; Verifying Your Kernel 327 $aHandling ErrorsCompiling and Executing; Timing Your Kernel; Timing with CPU Timer; Timing with nvprof; Organizing Parallel Threads; Indexing Matrices with Blocks and Threads; Summing Matrices with a 2D Grid and 2D Blocks; Summing Matrices with a 1D Grid and 1D Blocks; Summing Matrices with a 2D Grid and 1D Blocks; Managing Devices; Using the Runtime API to Query GPU Information; Determining the Best GPU; Using nvidia-smi to Query GPU Information; Setting Devices at Runtime; Summary; Chapter 3 CUDA Execution Model; Introducing the CUDA Execution Model; GPU Architecture Overview 327 $aThe Fermi ArchitectureThe Kepler Architecture; Profile-Driven Optimization; Understanding the Nature of Warp Execution; Warps and Thread Blocks; Warp Divergence; Resource Partitioning; Latency Hiding; Occupancy; Synchronization; Scalability; Exposing Parallelism; Checking Active Warps with nvprof; Checking Memory Operations with nvprof; Exposing More Parallelism; Avoiding Branch Divergence; The Parallel Reduction Problem; Divergence in Parallel Reduction; Improving Divergence in Parallel Reduction; Reducing with Interleaved Pairs; Unrolling Loops; Reducing with Unrolling 327 $aReducing with Unrolled WarpsReducing with Complete Unrolling; Reducing with Template Functions; Dynamic Parallelism; Nested Execution; Nested Hello World on the GPU; Nested Reduction; Summary; Chapter 4 Global Memory; Introducing the CUDA Memory Model; Benefits of a Memory Hierarchy; CUDA Memory Model; Memory Management; Memory Allocation and Deallocation; Memory Transfer; Pinned Memory; Zero-Copy Memory; Unified Virtual Addressing; Unified Memory; Memory Access Patterns; Aligned and Coalesced Access; Global Memory Reads; Global Memory Writes; Array of Structures versus Structure of Arrays 327 $aPerformance TuningWhat Bandwidth Can a Kernel Achieve?; Memory Bandwidth; Matrix Transpose Problem; Matrix Addition with Unified Memory; Summary; Chapter 5 Shared Memory and Constant Memory; Introducing CUDA Shared Memory; Shared Memory; Shared Memory Allocation; Shared Memory Banks and Access Mode; Configuring the Amount of Shared Memory; Synchronization; Checking the Data Layout of Shared Memory; Square Shared Memory; Rectangular Shared Memory; Reducing Global Memory Access; Parallel Reduction with Shared Memory; Parallel Reduction with Unrolling 327 $aParallel Reduction with Dynamic Shared Memory 330 $a Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the " 606 $aComputer architecture 606 $aMultiprocessors 606 $aParallel processing (Electronic computers) 606 $aParallel programming (Computer science) 606 $aEngineering & Applied Sciences$2HILCC 606 $aComputer Science$2HILCC 608 $aElectronic books. 615 4$aComputer architecture. 615 4$aMultiprocessors. 615 4$aParallel processing (Electronic computers). 615 4$aParallel programming (Computer science). 615 7$aEngineering & Applied Sciences 615 7$aComputer Science 676 $a004.35 676 $a004/.35 700 $aCheng$b John$0946775 701 $aGrossman$b Max$0946776 701 $aMcKercher$b Ty$0946777 801 0$bAU-PeEL 801 1$bAU-PeEL 801 2$bAU-PeEL 906 $aBOOK 912 $a9910458434103321 996 $aProfessional CUDA C Programming$92139008 997 $aUNINA