    • 大规模并行处理器程序设计(英文版原书第3版)/经典原版书库
      • 作者:(美)大卫·B.柯克//胡文美|责编:曲熠
      • 出版社:机械工业
      • ISBN:9787111668367
      • 出版日期:2021/01/01
      • 页数:550
    • 售价:55.6
  • 内容大纲

        本书是并行编程领域的必读之作,被图灵奖得主David Patterson誉为“天赐之书”。书中融会了两位作者多年来的教学和科研经验,被伊利诺伊大学厄巴纳一香槟分校(UIUC)、麻省理工学院(MIT)等名校用作教材。
        全书内容简洁、直观、实用,强调计算思维能力和并行编程技巧,通过三个阶段的阶梯式教学逐步优化程序性能,最终实现高效的并行程序。书中不仅深入讲解了并行模式、性能、CUDA动态并行等各项技术,而且用丰富的应用案例来闸释并行程序的开发过程。此外,本书还免费提供配套的Illinois-NVIDIA GPU教学工具箱,以及教学PPT、实验作业、项目指南等资料。
  • 作者介绍

  • 目录

    CHAPTER.1  Introduction
      1.1  Heterogeneous Parallel Computing
      1.2  Architecture of a Modern GPU
      1.3  Why More Speed or Parallelism
      1.4  Speeding Up Real Applications
      1.5  Challenges in Parallel Programming
      1.6  Parallel Programming Languages and Models
      1.7  Overarching Goals
      1.8  Organization of the Book
    CHAPTER.2  Data Parallel Computing
      2.1  Data Parallelism
      2.2  CUDA C Program Structure
      2.3  A Vector Addition Kernel
      2.4  Device Global Memory and Data Transfer
      2.5  Kernel Functions and Threading
      2.6  Kernel Launch
      2.7  Summary
        Function Declarations
        Kernel Launch
        Built-in (Predefined) Variables
        Run-time API
      2.8  Exercises
    CHAPTER.3  Scalable Parallel Execution
      3.1  CUDA Thread Organization
      3.2  Mapping Threads to Multidimensional Data
      3.3  Image Blur: A More Complex Kernel
      3.4  Synchronization and Transparent Scalability
      3.5  Resource Assignment
      3.6  Querying Device Properties
      3.7  Thread Scheduling and Latency Tolerance
      3.8  Summary
      3.9  Exercises
    CHAPTER.4  Memory and Data Locality
      4.1  Importance of Memory Access Efficiency
      4.2  Matrix Multiplication
      4.3  CUDA Memory Types
      4.4  Tiling for Reduced Memory Traffic
      4.5  A Tiled Matrix Multiplication Kernel
      4.6  Boundary Checks
      4.7  Memory as a Limiting Factor to Parallelism
      4.8  Summary
      4.9  Exercises
    CHAPTER 17 Parallel Programming and ComputationalThinking
      17.1  Goals of Parallel Computing
      17.2  Problem Decomposition

      17.3  Algorithm Selection
      17.4  Computational Thinking
      17.5  Single Program, Multiple Data,Shared Memoryand Locality
      17.6  Strategies for Computational Thinking
      7.7  A Hypothetical Example: Sodium Map of the Brain
      17.8  Summary
      17.9  Exercises
    CHAPTER 18 Programming a Heterogeneous ComputingCluster
      18.1  Background
      18.2  A Running Example
      18.3  Message Passing Interface Basics
      18.4  Message Passing Interface Point-to-Point Communication
      18.5  Overlapping Computation and Communication
      18.7  CUDA-Aware Message Passing Interface
      18.8  Summary
      18.9  Exercises
    CHAPTER 19 Parallel Programming with OpenACC
      19.1  The OpenACC Execution Model
      19.2  OpenACC Directive Format
      19.3  OpenACC by Example
        The OpenACC Kernels Directive
        The OpenACC Parallel Directive
        Comparison of Kernels and Parallel Directives
        OpenACC Data Directives
        OpenACC Loop Optimizations
        OpenACCRoutine Directive
        Asynchronous Computation and Data
      19.4  Comparing OpenACC and CUDA
      19.5  Interoperability with CUDA and Libraries
        Calling CUDA or Libraries with OpenACC Arrays
        Using CUDA Pointers in OpenACC
        Calling CUDA Device Kernels from OpenACC
      19.6  The Future of OpenACC
      19.7  Exercises
    CHAPTER 20 M ore on CUDA and Graphics Processing Unit
      20.1  Model of Host/Device Interaction
      20.2  Kernel Execution Control
      20.3  Memory Bandwidth and Compute Throughput
      20.4  Programming Environment
      20.5  Future Outlook
    CHAPTER 21 Conclusion and Outlook
      21.1  Goals Revisited
      21.2  Future Outlook

    Appendix A:An Introduction to OpenCL
    Appendix B:THRUST:a Productivity-oriented Library for CUDA
    Appendix C:CUDA Fortran
    Appendix D:An introduction to C++AMP