-
内容大纲
本书内容简洁、直观、实用,强调计算思维能力和并行编程技巧。本书主要分为四个部分:第一部分介绍异构并行计算编程的基础概念,包括数据并行化、GPU架构、CUDA编程及程序性能优化方法等内容;第二部分介绍并行模式,包括卷积、模板、并行直方图、归约、前缀和、归并等内容;第三部分介绍高级模式及应用,包括排序、稀疏矩阵计算、图遍历、深度学习、迭代式磁共振成像重建、静电势能图和计算思维等内容;第四部分介绍高级编程实践,包括异构计算集群编程、CUDA动态并行化等内容。本书不仅适合高等院校计算机相关专业的学生学习,也适合并行计算领域的技术人员参考。 -
作者介绍
-
目录
Foreword
Preface
Acknowledgments
CHAPTER 1 Introduction
1.1 Heterogeneous parallel computing
1.2 Why more speed or parallelism
1.3 Speeding up real applications
1.4 Challenges in parallel programming
1.5 Related parallel programming interfaces
1.6 Overarching goals
1.7 Organization of the book
References
Part I Fundamental Concepts
CHAPTER 2 Heterogeneous data parallel computing With special contribution from David Luebke
2.1 Data parallelism
2.2 CUDA C program structure
2.3 A vector addition kernel
2.4 Device global memory and data transfer
2.5 Kernel functions and threading
2.6 Calling kernel functions
2.7 Compilation
2.8 Summary
Exercises
References
CHAPTER 3 Multidimensional grids and data
3.1 Multidimensional grid organization
3.2 Mapping threads to multidimensional data
3.3 Image blur: a more complex kernel
3.4 Matrix multiplication
3.5 Summary
Exercises
CHAPTER 4 Compute architecture and scheduling
4.1 Architecture of a modern GPU
4.2 Block scheduling
4.3 Synchronization and transparent scalability
4.4 Warps and SIMD hardware
4.5 Control divergence
4.6 Warp scheduling and latency tolerance
4.7 Resource partitioning and occupancy
4.8 Querying device properties
4.9 Summary
Exercises
References
CHAPTER 5 Memory architecture and data locality
5.1 Importance of memory access efficiency
5.2 CUDA memory types
5.3 Tiling for reduced memory traffic
5.4 A tiled matrix multiplication kernel
5.5 Boundary checks
5.6 Impact of memory usage on occupancy
5.7 Summary
Exercises
CHAPTER 6 Performance considerations
6.1 Memory coalescing
6.2 Hiding memory latency
6.3 Thread coarsening
6.4 A checklist of optimizations
6.5 Knowing your computation’s bottleneck
6.6 Summary
Exercises
References
Part II Parallel Patterns
CHAPTER 7 Convolution
An introduction to constant memory and caching
7.1 Background
7.2 Parallel convolution: a basic algorithm
7.3 Constant memory and caching
7.4 Tiled convolution with halo cells
7.5 Tiled convolution using caches for halo cells
7.6 Summary
Exercises
Part III Advanced Patterns and Applications
Part IV Advanced Practices
Appendix A: Numerical considerations
Index
同类热销排行榜
- C语言与程序设计教程(高等学校计算机类十二五规划教材)16
- 电机与拖动基础(教育部高等学校自动化专业教学指导分委员会规划工程应用型自动化专业系列教材)13.48
- 传感器与检测技术(第2版高职高专电子信息类系列教材)13.6
- ASP.NET项目开发实战(高职高专计算机项目任务驱动模式教材)15.2
- Access数据库实用教程(第2版十二五职业教育国家规划教材)14.72
- 信号与系统(第3版下普通高等教育九五国家级重点教材)15.08
- 电气控制与PLC(普通高等教育十二五电气信息类规划教材)17.2
- 数字电子技术基础(第2版)17.36
- VB程序设计及应用(第3版十二五职业教育国家规划教材)14.32
- Java Web从入门到精通(附光盘)/软件开发视频大讲堂27.92
推荐书目
-

孩子你慢慢来/人生三书 华人世界率性犀利的一枝笔,龙应台独家授权《孩子你慢慢来》20周年经典新版。她的《...
-

时间简史(插图版) 相对论、黑洞、弯曲空间……这些词给我们的感觉是艰深、晦涩、难以理解而且与我们的...
-

本质(精) 改革开放40年,恰如一部四部曲的年代大戏。技术突变、产品迭代、产业升级、资本对接...
[
