-
内容大纲
本书是并行编程领域的必读之作,被图灵奖得主David Patterson誉为“天赐之书”。书中融会了两位作者多年来的教学和科研经验,被伊利诺伊大学厄巴纳一香槟分校(UIUC)、麻省理工学院(MIT)等名校用作教材。
全书内容简洁、直观、实用,强调计算思维能力和并行编程技巧,通过三个阶段的阶梯式教学逐步优化程序性能,最终实现高效的并行程序。书中不仅深入讲解了并行模式、性能、CUDA动态并行等各项技术,而且用丰富的应用案例来闸释并行程序的开发过程。此外,本书还免费提供配套的Illinois-NVIDIA GPU教学工具箱,以及教学PPT、实验作业、项目指南等资料。
与上一版相比,第3版对书中内容进行了全面修订,具体更新如下:
·新增三章讨论并行模式,涵盖直方图计算、归并排序和图搜索。
·新增一章讨论深度学习应用案例。
·新增一章讨论CUDA高级特性的演进,并介绍了CuDNN等新库。 -
作者介绍
-
目录
Preface
Acknowledgements
CHAPTER.1 Introduction
1.1 Heterogeneous Parallel Computing
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism
1.4 Speeding Up Real Applications
1.5 Challenges in Parallel Programming
1.6 Parallel Programming Languages and Models
1.7 Overarching Goals
1.8 Organization of the Book
References
CHAPTER.2 Data Parallel Computing
2.1 Data Parallelism
2.2 CUDA C Program Structure
2.3 A Vector Addition Kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Kernel Launch
2.7 Summary
Function Declarations
Kernel Launch
Built-in (Predefined) Variables
Run-time API
2.8 Exercises
References
CHAPTER.3 Scalable Parallel Execution
3.1 CUDA Thread Organization
3.2 Mapping Threads to Multidimensional Data
3.3 Image Blur: A More Complex Kernel
3.4 Synchronization and Transparent Scalability
3.5 Resource Assignment
3.6 Querying Device Properties
3.7 Thread Scheduling and Latency Tolerance
3.8 Summary
3.9 Exercises
CHAPTER.4 Memory and Data Locality
4.1 Importance of Memory Access Efficiency
4.2 Matrix Multiplication
4.3 CUDA Memory Types
4.4 Tiling for Reduced Memory Traffic
4.5 A Tiled Matrix Multiplication Kernel
4.6 Boundary Checks
4.7 Memory as a Limiting Factor to Parallelism
4.8 Summary
4.9 Exercises
……
CHAPTER 17 Parallel Programming and ComputationalThinking
17.1 Goals of Parallel Computing
17.2 Problem Decomposition
17.3 Algorithm Selection
17.4 Computational Thinking
17.5 Single Program, Multiple Data,Shared Memoryand Locality
17.6 Strategies for Computational Thinking
7.7 A Hypothetical Example: Sodium Map of the Brain
17.8 Summary
17.9 Exercises
References
CHAPTER 18 Programming a Heterogeneous ComputingCluster
18.1 Background
18.2 A Running Example
18.3 Message Passing Interface Basics
18.4 Message Passing Interface Point-to-Point Communication
18.5 Overlapping Computation and Communication
18.7 CUDA-Aware Message Passing Interface
18.8 Summary
18.9 Exercises
Reference
CHAPTER 19 Parallel Programming with OpenACC
19.1 The OpenACC Execution Model
19.2 OpenACC Directive Format
19.3 OpenACC by Example
The OpenACC Kernels Directive
The OpenACC Parallel Directive
Comparison of Kernels and Parallel Directives
OpenACC Data Directives
OpenACC Loop Optimizations
OpenACCRoutine Directive
Asynchronous Computation and Data
19.4 Comparing OpenACC and CUDA
Portability
Performance
Simplicity
19.5 Interoperability with CUDA and Libraries
Calling CUDA or Libraries with OpenACC Arrays
Using CUDA Pointers in OpenACC
Calling CUDA Device Kernels from OpenACC
19.6 The Future of OpenACC
19.7 Exercises
CHAPTER 20 M ore on CUDA and Graphics Processing Unit
Computing
20.1 Model of Host/Device Interaction
20.2 Kernel Execution Control
20.3 Memory Bandwidth and Compute Throughput
20.4 Programming Environment
20.5 Future Outlook
References
CHAPTER 21 Conclusion and Outlook
21.1 Goals Revisited
21.2 Future Outlook
Appendix A:An Introduction to OpenCL
Appendix B:THRUST:a Productivity-oriented Library for CUDA
Appendix C:CUDA Fortran
Appendix D:An introduction to C++AMP
Index
同类热销排行榜
- C语言与程序设计教程(高等学校计算机类十二五规划教材)16
- 电机与拖动基础(教育部高等学校自动化专业教学指导分委员会规划工程应用型自动化专业系列教材)13.48
- 传感器与检测技术(第2版高职高专电子信息类系列教材)13.6
- ASP.NET项目开发实战(高职高专计算机项目任务驱动模式教材)15.2
- Access数据库实用教程(第2版十二五职业教育国家规划教材)14.72
- 信号与系统(第3版下普通高等教育九五国家级重点教材)15.08
- 电气控制与PLC(普通高等教育十二五电气信息类规划教材)17.2
- 数字电子技术基础(第2版)17.36
- VB程序设计及应用(第3版十二五职业教育国家规划教材)14.32
- Java Web从入门到精通(附光盘)/软件开发视频大讲堂27.92
推荐书目
-
孩子你慢慢来/人生三书 华人世界率性犀利的一枝笔,龙应台独家授权《孩子你慢慢来》20周年经典新版。她的《...
-
时间简史(插图版) 相对论、黑洞、弯曲空间……这些词给我们的感觉是艰深、晦涩、难以理解而且与我们的...
-
本质(精) 改革开放40年,恰如一部四部曲的年代大戏。技术突变、产品迭代、产业升级、资本对接...