Using likwid-topology and likwid-pin

The likwid commands become available by loading the "likwid" module (module load likwid). For the beginner, the most important commands are likwid-topology and likwid-pin.


likwid-topology


This command gives an overview of the node topology, i.e., what resources are available (threads, cores, NUMA domains, caches,...) and how they are organized. Example on one SuperMIC node (a cluster at LRZ Garching):

-------------------------------------------------------------
CPU type: Intel Core IvyBridge EP processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets: 2
Cores per socket: 8
Threads per core: 2
-------------------------------------------------------------
HWThread Thread Core Socket
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 0 6 0
7 0 7 0
8 0 0 1
9 0 1 1
10 0 2 1
11 0 3 1
12 0 4 1
13 0 5 1
14 0 6 1
15 0 7 1
16 1 0 0
17 1 1 0
18 1 2 0
19 1 3 0
20 1 4 0
21 1 5 0
22 1 6 0
23 1 7 0
24 1 0 1
25 1 1 1
26 1 2 1
27 1 3 1
28 1 4 1
29 1 5 1
30 1 6 1
31 1 7 1
-------------------------------------------------------------
Socket 0: ( 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 )
Socket 1: ( 8 24 9 25 10 26 11 27 12 28 13 29 14 30 15 31 )
-------------------------------------------------------------
*************************************************************
Cache Topology
*************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 16 ) ( 1 17 ) ( 2 18 ) ( 3 19 ) ( 4 20 ) ( 5 21 ) ( 6 22 ) ( 7 23 ) ( 8 24 ) ( 9 25 ) ( 10 26 ) ( 11 27 ) ( 12 28 ) ( 13 29 ) ( 14 30 ) ( 15 31 )
-------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 16 ) ( 1 17 ) ( 2 18 ) ( 3 19 ) ( 4 20 ) ( 5 21 ) ( 6 22 ) ( 7 23 ) ( 8 24 ) ( 9 25 ) ( 10 26 ) ( 11 27 ) ( 12 28 ) ( 13 29 ) ( 14 30 ) ( 15 31 )
-------------------------------------------------------------
Level: 3
Size: 20 MB
Cache groups: ( 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 ) ( 8 24 9 25 10 26 11 27 12 28 13 29 14 30 15 31 )
-------------------------------------------------------------
*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2
-------------------------------------------------------------
Domain 0:
Processors: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
Relative distance to nodes: 10 11
Memory: 27526.5 MB free of total 32738.2 MB
-------------------------------------------------------------
Domain 1:
Processors: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
Relative distance to nodes: 11 10
Memory: 29355.3 MB free of total 32768 MB
-------------------------------------------------------------
*************************************************************
Graphical:
*************************************************************
Socket 0:
+---------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| | 0 16 | | 1 17 | | 2 18 | | 3 19 | | 4 20 | | 5 21 | | 6 22 | | 7 23 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ |
| +-----------------------------------------------------------------------------+ |
| | 20MB | |
| +-----------------------------------------------------------------------------+ |
+---------------------------------------------------------------------------------+
Socket 1:
+-----------------------------------------------------------------------------------------+
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | 8 24 | | 9 25 | | 10 26 | | 11 27 | | 12 28 | | 13 29 | | 14 30 | | 15 31 | |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | | 32kB | |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | | 256kB | |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| +-------------------------------------------------------------------------------------+ |
| | 20MB | |
| +-------------------------------------------------------------------------------------+ |
+-----------------------------------------------------------------------------------------+

This is all pretty much self-explanatory. The most interesting information is contained in the ASCII art output (generated by the -g option) at the bottom: Here we see how the cores and SMT threads are numbered ("physical numbering"). You can get more exhaustive cache information with the -c option. 


likwid-pin


This tool lets you pin threads in an threaded application to the cores in the node. The core numbering can be physical (as printed by likwid-topology) or logical (grouped by hardware units such as sockets or cores). Some examples:

As usual you can get a short help message with

$ likwid-pin -h

The simplest way to use the tool is the following (this is for an OpenMP code):

$ likwid-pin -c 0-3 ./myApp parameters

This will pin the threads of the application in turn (as they are created) to cores 0...3. These core IDs are the ones that are printed by likwid-topology. The tool does not care where those cores actually are, e.g., whether core 0 and 1 address separate physical cores, or two logical cores on the same physical core, or even two cores on different sockets. We do not have to set OMP_NUM_THREADS explicitly because likwid-pin sets it automatically (inferring the desired number of threads from the pin mask).

Logical numbering uses topological features of the machine to identify core IDs. E.g., to pin 16 threads onto the 2x8=16 physical cores of a dual-socket SuperMIC node (regardless of how they are actually  numbered), we write:

$ likwid-pin  -c N:0-15 ./myApp parameters

To pin to the eight physical cores in the second socket:

$ likwid-pin  -c S1:0-7 ./myApp parameters

This is the preferred method of pinning because as soon as a prefix (such as "N" or "S0") is used,  the tool knows that you want to pin to subsequent physical cores within this topological entity (whole node for N, socket 0 for S0, etc.).

Caveat: Pinning the threads has some overhead. Each thread is pinned right after it gets created; in OpenMP, this happens when the first parallel region is encountered. If you require accurate timing for a parallel region, make sure that it is not the first one in the program, else you may see the impact of the pinning overhead.


Further information can be found on the likwid-pin website: http://tiny.cc/LIKWID



Last modified: Tuesday, 26 May 2020, 8:05 PM