Exercise: A 2D Jacobi smoother
The folder J2D contains subfolders ending with "-jacobi", which contain an OpenMP-parallel 2D stencil solver (taken from the RWTH Aachen examples collection) in C and Fortran90.
Compile the code using the provided makefile. To supply the input parameters you need to pipe the input file into the command:
$ ./jacobi.exe < input
The program prints performance in MFlop/s.
- Looking at the code, calculate the conversion factor between MFlop/s and MLUP/s (lattice site updates per second).
- Parallelize the code with OpenMP.
- Perform a roofline analysis, using the maximum memory bandwidth of 42 GB/s. What is the expected performance on a full Emmy socket (10 cores)? Do you get anywhere near that (use the standard problem size of 4000x4000)?
- Think about simple code optimizations. What is the best socket-level performance you can get?
- Does the performance of your code scale from 1 to 2 sockets? If it does not, fix it.