This day is an introduction to node-level performance modeling and optimization.

  • A quick survey of modern processor architecture: Units, pipelines, SIMD, SMT, caches, memory
  • Microarchitectural exploration via benchmarks
  • Basic performance modeling: Roofline
  • Model-­guided optimization on the example of a Jacobi smoother
  • Impact of SIMD and SMT 

  • Lab: Performance issues on the node
  • Measuring divide latency
  • The STREAM benchmarks as a starting point for Roofline modeling
  • Modeling and improving the performance of a stencil smoother
  • Performance vs. scalability vs. saturation
  • Optional: Indirect memory access