Hands-On: likwid-topology and likwid-pin
In this hands-on exercise you will compile and run a main memory bandwidth benchmark. You will learn how to explore node properties and topology with likwid-topology and how to use likwid-pin to explicitly control thread affinity.
Finally you learn how to determine the maximum sustained memory bandwidth for one socket and a complete node.
Time to finish: around 15 Minutes.
You can find the benchmark code in the BWBENCH folder of the teacher account.
- Get the source from the teaching account:
- $ cp -a ~ghager/BWBENCH ~
- Load Intel compiler and LIKWID modules:
- $ module load intel likwid
- Explore node topology
$ likwid-topology -g
Answer the following questions:
- How many cores are available in one socket, the whole node?
- Is SMT enabled?
- What is the aggregate size of the last level cache in MB per socket?
- How many ccNUMA memory domains are there?
- What is the total installed memory capacity?
Compile a threaded OpenMP binary with optimizing flags:
$ icc -Ofast -xHost -std=c99 -qopenmp -o bwBench-ICC bwBench.c
Execute with 12 threads without explicit pinning:
$ env OMP_NUM_THREADS=12 ./bwBench-ICC
Perform multiple (about 10) runs.
- Do the results fluctuate?
- By how much?
Run again with explicit pinning also using 12 threads but pinned to 12 physical cores of socket 0:
$ likwid-pin -c S0:0-11 ./bwBench-ICC
- Is the result different?
- If yes: why is it different?
- Can you recover the previous result?
Benchmark the memory bandwidth scaling within one socket (in 1 thread steps):
- What is the maximum memory bandwidth in GB/s?
- Which benchmark case reaches the highest bandwidth?
- At which core count can you saturate the main memory bandwidth?
Measure the maximum memory bandwidth using all cores in the node (single measurement).
What is the maximum bandwidth in GB/s?