.Key Features
- Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.
- Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.
- Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures
- Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms
Description
Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future.
Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.
Readership
Software engineers, programmers, hardware engineers, students / advanced students
Contents
Table of Contents first edition
- Introduction to Parallel Programming
- Introduction to OpenCL
- OpenCL Device Architectures
- Basic OpenCL Examples
- Understanding OpenCL’s Concurrency and Execution Model
- Dissecting a CPU/GPU OpenCL Implementation
- OpenCL Case Study: Convolution
- OpenCL Case Study: Video Processing
- OpenCL Case Study: Histogram
- OpenCL Case Study: Mixed Particle Simulation
- OpenCL Extensions
- OpenCL Profiling and Debugging
- WebCL
Table of Contents second edition
- Introduction to Parallel Programming
- Introduction to OpenCL
- OpenCL Device Architectures
- Basic OpenCL Examples
- Understanding OpenCL’s Concurrency and Execution Model
- Dissecting a CPU/GPU OpenCL Implementation
- Data management
- OpenCL Case Study: Convolution
- OpenCL Case Study: Histogram
- OpenCL Case Study: Mixed Particle Simulation
- OpenCL Extensions
- Foreign lands: Plugging OpenCL In
- OpenCL Profiling and Debugging
- Performance optimization of an image analysis application.
Figures from the first edition
Here are all the figures from the book in .jpg format, provided for the convenience of instructors using this book in a course, who may want to incorporate them into lecture slides.
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Example code associated with chapters from the first edition
The example code associated with the chapters is free for use, however, Morgan Kaufmann requests that a note along the lines of the following be provided with any reuse of the code:
‘Code derives from example in “Heterogeneous Computing with OpenCL” published 2011 by Morgan Kaufmann’\
Chapter2 – Vector add
Chapter4 – Convolution and rotation
Chapter7 – Advanced convolution
Chapter9 – Histogram
Chapter10 – Particle simulation
Bonus example: Radix sort
As an added bonus we have a radix sort implementation that achieves over 500MKeys/second on an HD6970 GPU.
RadixSort description | RadixSort implementation
Author Information
By Benedict R. Gaster, OpenCL Architect, AMD; Lee Howes, Member of Technical Staff, AMD; David R. Kaeli, Department of Computer and Electrical Engineering, Northeastern University, Boston; Perhaad Mistry, Northeastern University, Boston and Dana Schaa, Northeastern University, Boston
Errata to the first edition
If you notice any errors with the book, please e-mail errata at heterogeneouscomputingwithopencl org or errata at heterogeneouscompute org
- P71 bufferC should be write only, not read only
- P72/73 argument names should be bufferA not d_A and so on for others.
- P81 For consistency with the rest of the chapter, the code here should apply CL_TRUE to the data movement operations, not CL_FALSE as listed. In this case the code is likely valid assuming an in-order queue is used and given the blocking read back on page 83.
- P89 address computation on the left should be 1 + 2*10 to match the diagram.
- P95 “This parameter makes the read buffer asynchronous”, should read “synchronous” as the text details.
- P97 last entry on Figure 5.3 sourceEvents should be kernelEvent. Also in this section while it is not exactly *wrong* it is not necessary (and would likely not be intended in real code) to use CL_TRUE for the read and write operations, given that events are included throughout. This is a lack of clarity rather than an error “sourceEvents” is intended to be the implicit vector containing the source events for the enqueued item.
- P149 Chapter 6 the PACT reference is incorrectly formatted. There should only be 3 references here not 4.
|