1. Login to the HPC machines at RRZE
Course logins for the HPC systems at RRZE are provided during the first exercise. You will get a password for your course accounts.
You will perform all of the benchmark work on the IvyBridge-EP Cluster "Emmy". Detailed information about login, file systems etc. can be found on the website:
Log into the cluster front-end machine of Emmy using
ssh -Y -p 8196 XXXXXX@grid.rrze.uni-erlangen.de
(substitute the "X...X" with the appropriate account). Please do not perform memory-intensive test runs on the frontends or dialog servers as this will disturb user operations. Moreover, many users are active on the frontends and you will not get sensible performance data anyway
Example code for the hands-on exercises can be found in the directory ~s43a0000. E.g., for copying the directory DIV to your home you can type:
cp -a ~s43a0000/DIV ~
The descriptions of the exercises contain the names of the respective folders.
On the Emmy cluster we use the Intel compiler suite. Usually the Intel compilers deliver higher performance than the GCC and we are quite familiar with their characteristics. To access the Intel compilers you first have to set up your environment correctly. Execute once per shell:
module load intel64
(you can also specify a version number; this will be required from time to time). This will set up the necessary PATH and other variables that you need to work with the Intel compilers. The compilers are called ifort (Fortran77/90), icc(C) and icpc (C++).
2.1 Recommended compiler switches
The Intel compilers have loads of command line options. We recommend to use -O3 -xHost -fno-alias. The option -help will give you a complete list. The standard options (-c, -g, -o etc.) are identical to GCC. If you want to have a report on what the compiler did in the optimization stage you can use -opt_report3, but don't expect too much readable information.
3. Batch processing
Short test runs can be started directly on the Lima frontends. However, for producing reliable benchmark results it is preferable to submit the jobs to the batch queue. The batch system accepts requests for resources (e.g., "6 nodes for 24 hours") and queues them according to some priority scheme. A job gets run, i.e. a previously specified shell script gets executed, when the resources are available and the batch system has chosen the job to be started. Some nodes of the Lima cluster will be reserved for your exclusive use during tutorial hours. During the rest of the time, no CPUs are reserved, but you have a high priority. Apart from running a batch script (see below) and interactive testing on the frontends you can submit an interactive batch job which gives you, e.g., a shell on a compute node for some time. You can do this by typing:
qsub -l nodes=1:ppn=40,walltime=02:00:00 -I
This command will allocate a complete node (40 CPUs) for 2 hours. You should always request complete nodes so that you can do your benchmarks on a quiet machine. There will be no need to request more than one node for this course.
Clock frequency settings
If you want to get accurate timings in terms of processor cycles, you have to know the exact clock speed of the CPU. The Emmy processors have a nominal clock speed of 2.2 GHz, but "Turbo Mode" is enabled by default. This means that the CPU can "overclock" to some degree, depending on the number of active cores and the temperature. The highest possible clock speed is 3.0 GHz. In order to set the clock frequency to a specific (fixed) value you can specify a parameter at job submit time:
qsub -I -l nodes=1:ppn=40:f2.2,walltime=01:00:00 ...
In this example, the clock speed for all cores in this job would be set to 2.2 GHz. You can select from the following options: f2.2,f2.1,f2.0,f1.9,f1.8,f1.7,f1.6,f1.5,f1.4,f1.3,f1.2. It is also possible to use the likwid-setFrequencies tool, as shown in the lecture.
3. LIKWID Tools
Use the command
module load likwid
to set up your shell environment for using the LIKWID tools. If you want to use likwid-perfctr you have to subnmit your job in a special way as to not interfere with system monitoring:
qsub -I -l nodes=1:ppn=40:likwid,walltime=01:00:00 ...
Note: The Emmy cluster uses "Turbo Mode" with its processors. This means that, depending on the number of cores used and environmental conditions, a CPU chip can run at clock speeds higher than its "nominal" speed of 2.2 GHz (up to 3.0 GHz). This can make benchmarking difficult, and sometimes it is good to have a fixed frequency. You can achieve this by adding a property to the resource specification on the qsub command line:
$ qsub -l nodes=1:ppn=40:f2.2,walltime=01:00:00 ...
This will set the frequency to a fixed 2.2 GHz for all cores on the node(s) of the job. The allowed settings are:
So you can also run at a frequency which is lower than the nominal clock speed. "turbo" is the default.
likwid-setFrequencies does not work on Emmy for technical reasons.