Please fill out the course survey! A free USB stick awaits!
This course teaches performance engineering approaches on the compute node level. “Performance engineering” as we define it is more than employing tools to identify hotspots and bottlenecks. It is about developing a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted. We introduce a “holistic” node-level performance engineering strategy centered around the roofline performance model and apply it to different algorithms from computational science. We also show that simple, easy to use tools bring us a long way towards deep insight into the interaction between software and hardware.
- Welcome – Intro – Computer architecture
- Tools: topology, affinity, clock speed
- Microbenchmarking for architectural exploration
- The Roofline performance model: basics
- Tools: hardware performance counters
- Optimal use of parallel resources: SIMD, ccNUMA, (SMT)
- Performance Engineering with patterns
- Roofline case study: Jacobi smoother
- Roofline case study: sparse matrix-vector multiplication
- Case study: MiniMD
- Optional: The ECM performance model