The design concept of multicore processors is dramatically changing the capabilities of computer resources and thus has deep impact on writing efficient application codes in science and engineering. Trading single core performance for parallelism is the basic trend, which started about five years ago at the desktop CPU level with the emergence of the first dualcore processors. An extreme implementation of this concept is already used in special-purpose accelerators such as Graphics Processing Units (GPUs).
This compact course provides an introduction on how to write efficient code on compute nodes based on modern multicore processors and GPUs. Topics covered are single-core optimizations, multithreaded programming with OpenMP, data locality issues arising from ccNUMA architectures, and GPU coding with CUDA. Performance modeling is employed as a key guiding principle to compare the computational requirements of an application or compute kernel with the computer architecture to be used, and to derive purposive optimization and parallelization techniques. Hands-on session will be done with simple compute kernels on the latest multicore AMD or Intel CPUs and nVIDIA GPUs.
Every student is required to hand in a little report about the tutorial exercises. Exercises marked with (M) are mandatory. From all others, pick a subset of two. For each exercise in your report you should comment on your solution strategy and/or performance results. The format is free as long as you submit in Word, OpenOffice, or PDF format.
Topic 1Basics of computer architecture and sequential optimization
Topic 2Basics of parallelization and parallel programming with OpenMP
Topic 3Programming GPUs for scientific computing