HiPEAC 2012

Heterogeneous Programming – A Tutorial


Benedict R. Gaster (benedict.gaster@amd.com)

Advanced Micro Devices, One AMD Place, PO Box 3453, MS: 325, CA 94088


Lee Howes (lee.howes@amd.com)

Advanced Micro Devices, One AMD Place, PO Box 3453, MS: 345, CA 94088


Simon McIntosh-Smith (simonm@compsci.bristol.ac.uk)

University of Bristol, Merchant Venturers Building, Woodland Road, Clifton, Bristol, BS8 1UB, UK


We apologize for the time it has taken to get this content together. However, we now have the slides legally approved for distribution.



Heterogeneous computing is main stream, it can be found in almost all modern media devices and it can be found in the cluster of the supercomputing world too. These heterogeneous systems are capable of performing task- and data-parallel executions. But how can we program these machines?

For task- parallelism, popular task parallel runtimes have emerged, e.g. Intel’s Thread Building Blocks and Microsoft’s Concurrency Runtime, while for data-parallel computations programming models such as OpenCL have emerged for heterogeneous programming devices including CPUs, GPUs and other processors.  Moving beyond the node, e.g. out to the cluster, Message Passing APIs (such as MPI) provide the needed abstractions for concurrent execution across a network of nodes. Alternatives are emerging and in particular Partitioned Global Address Space languages, e.g. Chapel and UPC,  have demonstrated abstractions that bridge the gap between programming a single node and programming many nodes.

In this tutorial, we will introduce programming heterogeneous systems, using OpenCL, Task parallel runtimes, and PGAS.  This will be a “programmer’s session” where we cover the ideas behind these languages, show how different architectures affect the design and requirements but also show how these ideas are translated into source code.  We will do this through a series of progressively more challenging examples, enabling experienced programmers to become productive heterogeneous system developers.

This tutorial is intended to go beyond our OpenCL tutorial from last year’s and focus more on the practical aspects of programing modern heterogeneous systems.

Tutorial Information:

Title: OpenCL: An Introduction to Heterogeneous Programming
Content Level: 50% intro, 30% intermediate,  20% advanced
Presentation Length: Full Day
Previously Presented: No

Format: Lectures


This full day tutorial will consist of the following modules:

  • Heterogeneous computing in a modern age
  • Task- parallel runtimes overview
  • Data- parallel runtimes overview
  • Exploring beyond the node, messaging passing and PGAS overview
  • A hardware overview, comparing and contrasting different architectural styles and how they affect heterogeneous programing languages.
  • Heterogeneous computing, 60 min
    • Architecture design space
    • Concurrent programming models
    • Task-parallel runtimes overview, 30 min.


  • Tasks, Futures and asynchronous programming
  • Break
  • Task-parallel runtimes (continued), 30 mins
    • Microsoft’s Concurrency Runtime Overview
    • An example: Mixed particle simulation
  • Data-parallel runtimes overview, 60 mins
    • Parallel for (map), data-movement, async programming
  • OpenCL 1.2 Overview
    • An example: Extending the mixed particle simulation for data-parallel offload
  • Lunch
  • Moving beyond the node, 90 mins
    • Message Passing and Partition Global Address Space programming
    • MPI Overview
    • An example: Molecular Docking (using OpenCL and MPI)
  • Break
  • Chapel – A look at a PGAS model, 90 mins
    • Chapel overview
    • An example: Extending the mixed particle simulation to the cluster