Heterogeneous Computing with OpenCL

Cover of Heterogenous Computing with OpenCL texbookHeterogeneous Computing with OpenCL By Benedict R. Gaster, Lee Howes, David R. Kaeli, Perhaad Mistry & Dana Schaa400 pages
Trim Size 7 1/2 X 9 1/4 in
Copyright 2011-2012 

Now available in Chinese.

Second edition available for pre-order. Covers OpenCL 1.2 features as well as further details on profiling and debugging.

.Key Features

  • Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.
  • Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.
  • Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures
  • Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms

Description

Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future.

Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.

Readership

Software engineers, programmers, hardware engineers, students / advanced students

 

Contents

Table of Contents first edition

  1. Introduction to Parallel Programming
  2. Introduction to OpenCL
  3. OpenCL Device Architectures
  4. Basic OpenCL Examples
  5. Understanding OpenCL’s Concurrency and Execution Model
  6. Dissecting a CPU/GPU OpenCL Implementation
  7. OpenCL Case Study: Convolution
  8. OpenCL Case Study: Video Processing
  9. OpenCL Case Study: Histogram
  10. OpenCL Case Study: Mixed Particle Simulation
  11. OpenCL Extensions
  12. OpenCL Profiling and Debugging
  13. WebCL

 

Table of Contents second edition

  1. Introduction to Parallel Programming
  2. Introduction to OpenCL
  3. OpenCL Device Architectures
  4. Basic OpenCL Examples
  5. Understanding OpenCL’s Concurrency and Execution Model
  6. Dissecting a CPU/GPU OpenCL Implementation
  7. Data management
  8. OpenCL Case Study: Convolution
  9. OpenCL Case Study: Histogram
  10. OpenCL Case Study: Mixed Particle Simulation
  11. OpenCL Extensions
  12. Foreign lands: Plugging OpenCL In
  13. OpenCL Profiling and Debugging
  14. Performance optimization of an image analysis application.

Figures from the first edition

Here are all the figures from the book in .jpg format, provided for the convenience of instructors using this book in a course, who may want to incorporate them into lecture slides.

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Example code associated with chapters from the first edition

The example code associated with the chapters is free for use, however, Morgan Kaufmann requests that a note along the lines of the following be provided with any reuse of the code:

‘Code derives from example in “Heterogeneous Computing with OpenCL” published 2011 by Morgan Kaufmann’\

Chapter2 – Vector add

Chapter4 – Convolution and rotation

Chapter7 – Advanced convolution

Chapter9 – Histogram

Chapter10 – Particle simulation


Bonus example: Radix sort

As an added bonus we have a radix sort implementation that achieves over 500MKeys/second on an HD6970 GPU.

RadixSort description | RadixSort implementation

 

Author Information

By Benedict R. Gaster, OpenCL Architect, AMD; Lee Howes, Member of Technical Staff, AMD; David R. Kaeli, Department of Computer and Electrical Engineering, Northeastern University, Boston; Perhaad Mistry, Northeastern University, Boston and Dana Schaa, Northeastern University, Boston

 

Errata to the first edition

If you notice any errors with the book, please e-mail errata at heterogeneouscomputingwithopencl org or errata at heterogeneouscompute org

  • P71 bufferC should be write only, not read only
  • P72/73 argument names should be bufferA not d_A and so on for others.
  • P81 For consistency with the rest of the chapter, the code here should apply CL_TRUE to the data movement operations, not CL_FALSE as listed. In this case the code is likely valid assuming an in-order queue is used and given the blocking read back on page 83.
  • P89 address computation on the left should be 1 + 2*10 to match the diagram.
  • P95 “This parameter makes the read buffer asynchronous”, should read “synchronous” as the text details.
  • P97 last entry on Figure 5.3 sourceEvents should be kernelEvent. Also in this section while it is not exactly *wrong* it is not necessary (and would likely not be intended in real code)  to use CL_TRUE for the read and write operations, given that events are included throughout. This is a lack of clarity rather than an error “sourceEvents” is intended to be the implicit vector containing the source events for the enqueued item.
  • P149 Chapter 6 the PACT reference is incorrectly formatted. There should only be 3 references here not 4.