欢迎光临澳大利亚新华书店网 [登录 | 免费注册]

    • 大规模并行处理器程序设计(英文版原书第4版)/经典原版书库
      • 作者:(美)胡文美//大卫·B.柯克//(黎巴嫩)伊扎特·埃尔·哈吉|责编:曲熠
      • 出版社:机械工业
      • ISBN:9787111774716
      • 出版日期:2025/03/01
      • 页数:551
    • 售价:51.6
  • 内容大纲

        本书内容简洁、直观、实用,强调计算思维能力和并行编程技巧。本书主要分为四个部分:第一部分介绍异构并行计算编程的基础概念,包括数据并行化、GPU架构、CUDA编程及程序性能优化方法等内容;第二部分介绍并行模式,包括卷积、模板、并行直方图、归约、前缀和、归并等内容;第三部分介绍高级模式及应用,包括排序、稀疏矩阵计算、图遍历、深度学习、迭代式磁共振成像重建、静电势能图和计算思维等内容;第四部分介绍高级编程实践,包括异构计算集群编程、CUDA动态并行化等内容。本书不仅适合高等院校计算机相关专业的学生学习,也适合并行计算领域的技术人员参考。
  • 作者介绍

  • 目录

    Foreword
    Preface 
    Acknowledgments
      CHAPTER 1  Introduction
        1.1  Heterogeneous parallel computing
        1.2  Why more speed or parallelism
        1.3  Speeding up real applications
        1.4  Challenges in parallel programming
        1.5  Related parallel programming interfaces
        1.6  Overarching goals
        1.7  Organization of the book
        References
    Part I  Fundamental Concepts
      CHAPTER 2  Heterogeneous data parallel computing  With special contribution from David Luebke
        2.1  Data parallelism
        2.2  CUDA C program structure
        2.3  A vector addition kernel
        2.4  Device global memory and data transfer
        2.5  Kernel functions and threading
        2.6  Calling kernel functions
        2.7  Compilation
        2.8  Summary
        Exercises
        References
      CHAPTER 3  Multidimensional grids and data
        3.1  Multidimensional grid organization
        3.2  Mapping threads to multidimensional data
        3.3  Image blur: a more complex kernel
        3.4  Matrix multiplication
        3.5  Summary
        Exercises
      CHAPTER 4  Compute architecture and scheduling
        4.1  Architecture of a modern GPU
        4.2  Block scheduling
        4.3  Synchronization and transparent scalability
        4.4  Warps and SIMD hardware
        4.5  Control divergence
        4.6  Warp scheduling and latency tolerance
        4.7  Resource partitioning and occupancy
        4.8  Querying device properties
        4.9  Summary
        Exercises
        References
      CHAPTER 5  Memory architecture and data locality
        5.1  Importance of memory access efficiency
        5.2  CUDA memory types
        5.3  Tiling for reduced memory traffic
        5.4  A tiled matrix multiplication kernel
        5.5  Boundary checks
        5.6  Impact of memory usage on occupancy

        5.7  Summary
        Exercises 
      CHAPTER 6  Performance considerations
        6.1  Memory coalescing
        6.2  Hiding memory latency
        6.3  Thread coarsening
        6.4  A checklist of optimizations
        6.5  Knowing your computation’s bottleneck
        6.6  Summary
        Exercises 
        References
    Part II  Parallel Patterns
      CHAPTER 7  Convolution
      An introduction to constant memory and caching
        7.1  Background
        7.2  Parallel convolution: a basic algorithm
        7.3  Constant memory and caching
        7.4  Tiled convolution with halo cells
        7.5  Tiled convolution using caches for halo cells
        7.6  Summary
        Exercises 
    Part III  Advanced Patterns and Applications
    Part IV  Advanced Practices
    Appendix A: Numerical considerations
    Index