- 大规模并行处理器程序设计(英文版原书第3版)/经典原版书库
- - 作者：(美)大卫·B.柯克//胡文美|责编:曲熠
  - 出版社：机械工业
  - ISBN：9787111668367
  - 出版日期：2021/01/01
  - 页数：550
- 售价：55.6

内容大纲
    本书是并行编程领域的必读之作，被图灵奖得主David Patterson誉为“天赐之书”。书中融会了两位作者多年来的教学和科研经验，被伊利诺伊大学厄巴纳一香槟分校（UIUC）、麻省理工学院（MIT）等名校用作教材。
    全书内容简洁、直观、实用，强调计算思维能力和并行编程技巧，通过三个阶段的阶梯式教学逐步优化程序性能，最终实现高效的并行程序。书中不仅深入讲解了并行模式、性能、CUDA动态并行等各项技术，而且用丰富的应用案例来闸释并行程序的开发过程。此外，本书还免费提供配套的Illinois-NVIDIA GPU教学工具箱，以及教学PPT、实验作业、项目指南等资料。
    与上一版相比，第3版对书中内容进行了全面修订，具体更新如下：
    ·新增三章讨论并行模式，涵盖直方图计算、归并排序和图搜索。
    ·新增一章讨论深度学习应用案例。
    ·新增一章讨论CUDA高级特性的演进，并介绍了CuDNN等新库。
作者介绍
目录
Preface
Acknowledgements
CHAPTER.1  Introduction
  1.1  Heterogeneous Parallel Computing
  1.2  Architecture of a Modern GPU
  1.3  Why More Speed or Parallelism
  1.4  Speeding Up Real Applications
  1.5  Challenges in Parallel Programming
  1.6  Parallel Programming Languages and Models
  1.7  Overarching Goals
  1.8  Organization of the Book
    References
CHAPTER.2  Data Parallel Computing
  2.1  Data Parallelism
  2.2  CUDA C Program Structure
  2.3  A Vector Addition Kernel
  2.4  Device Global Memory and Data Transfer
  2.5  Kernel Functions and Threading
  2.6  Kernel Launch
  2.7  Summary
    Function Declarations
    Kernel Launch
    Built-in (Predefined) Variables
    Run-time API
  2.8  Exercises
    References
CHAPTER.3  Scalable Parallel Execution
  3.1  CUDA Thread Organization
  3.2  Mapping Threads to Multidimensional Data
  3.3  Image Blur: A More Complex Kernel
  3.4  Synchronization and Transparent Scalability
  3.5  Resource Assignment
  3.6  Querying Device Properties
  3.7  Thread Scheduling and Latency Tolerance
  3.8  Summary
  3.9  Exercises
CHAPTER.4  Memory and Data Locality
  4.1  Importance of Memory Access Efficiency
  4.2  Matrix Multiplication
  4.3  CUDA Memory Types
  4.4  Tiling for Reduced Memory Traffic
  4.5  A Tiled Matrix Multiplication Kernel
  4.6  Boundary Checks
  4.7  Memory as a Limiting Factor to Parallelism
  4.8  Summary
  4.9  Exercises
……
CHAPTER 17 Parallel Programming and ComputationalThinking
  17.1  Goals of Parallel Computing
  17.2  Problem Decomposition

  17.3  Algorithm Selection
  17.4  Computational Thinking
  17.5  Single Program, Multiple Data,Shared Memoryand Locality
  17.6  Strategies for Computational Thinking
  7.7  A Hypothetical Example: Sodium Map of the Brain
  17.8  Summary
  17.9  Exercises
    References
CHAPTER 18 Programming a Heterogeneous ComputingCluster
  18.1  Background
  18.2  A Running Example
  18.3  Message Passing Interface Basics
  18.4  Message Passing Interface Point-to-Point Communication
  18.5  Overlapping Computation and Communication
  18.7  CUDA-Aware Message Passing Interface
  18.8  Summary
  18.9  Exercises
    Reference
CHAPTER 19 Parallel Programming with OpenACC
  19.1  The OpenACC Execution Model
  19.2  OpenACC Directive Format
  19.3  OpenACC by Example
    The OpenACC Kernels Directive
    The OpenACC Parallel Directive
    Comparison of Kernels and Parallel Directives
    OpenACC Data Directives
    OpenACC Loop Optimizations
    OpenACCRoutine Directive
    Asynchronous Computation and Data
  19.4  Comparing OpenACC and CUDA
    Portability
    Performance
    Simplicity
  19.5  Interoperability with CUDA and Libraries
    Calling CUDA or Libraries with OpenACC Arrays
    Using CUDA Pointers in OpenACC
    Calling CUDA Device Kernels from OpenACC
  19.6  The Future of OpenACC
  19.7  Exercises
CHAPTER 20 M ore on CUDA and Graphics Processing Unit
    Computing
  20.1  Model of Host/Device Interaction
  20.2  Kernel Execution Control
  20.3  Memory Bandwidth and Compute Throughput
  20.4  Programming Environment
  20.5  Future Outlook
References
CHAPTER 21 Conclusion and Outlook
  21.1  Goals Revisited
  21.2  Future Outlook

Appendix A:An Introduction to OpenCL
Appendix B:THRUST:a Productivity-oriented Library for CUDA
Appendix C:CUDA Fortran
Appendix D:An introduction to C++AMP
Index

内容大纲

作者介绍

目录

同类热销排行榜

推荐书目