Working with the Emmy cluster at RRZE
1. Login to the HPC machines at RRZE
Course logins for the HPC systems at RRZE are provided during the first exercise. You will get a password for your course accounts.
You will perform all of the benchmark work on the IvyBridge-EP Cluster "Emmy". Detailed information about login, file systems etc. can be found on the website:
ssh -p 8196 hpckXX@grid.rrze.uni-erlangen.de --> emmy1
ssh -p 8197 hpckXX@grid.rrze.uni-erlangen.de --> emmy2
Please do not perform memory-intensive test runs on the frontends as this will disturb user operations. Moreover, many users are active on the frontends and you will not get sensible performance data anyway
On the Emmy cluster we use the Intel compiler suite. Usually the Intel compilers deliver higher performance than the GCC and we are quite familiar with their characteristics. To access the Intel compilers you first have to set up your environment correctly. Execute once per shell:
module load intel64
(you can also specify a version number; this will be required from time to time). This will set up the necessary PATH and other variables that you need to work with the Intel compilers. The compilers are called ifort (Fortran77/90), icc(C) and icpc (C++).
2.1 Recommended compiler switches
The Intel compilers have loads of command line options. We recommend to use -O3 -xHost -fno-alias. The option -help will give you a complete list. The standard options (-c, -g, -o etc.) are identical to GCC. If you want to have a report on what the compiler did in the optimization stage you can use -opt_report3, but don't expect too much readable information.
3. Batch processing
Short test runs can be started directly on the Lima frontends. However, for producing reliable benchmark results it is preferable to submit the jobs to the batch queue. The batch system accepts requests for resources (e.g., "6 nodes for 24 hours") and queues them according to some priority scheme. A job gets run, i.e. a previously specified shell script gets executed, when the resources are available and the batch system has chosen the job to be started. Some nodes of the Lima cluster will be reserved for your exclusive use during tutorial hours. During the rest of the time, no CPUs are reserved, but you have a high priority. Apart from running a batch script (see below) and interactive testing on the frontends you can submit an interactive batch job which gives you, e.g., a shell on a compute node for some time. You can do this by typing:
$ qsub -l nodes=1:ppn=40,walltime=02:00:00 -I
This command will allocate a complete node (40 hardware threads) for 2 hours. You should always request complete nodes so that you can do your benchmarks on a quiet machine. Unless you do message passing parallelization there will be no need to request more than one node.
If you want to run longer benchmarks or parameter studies you have to submit a batch script:
# the script runs in $HOME, so
# change to correct directory (i.e. the directory
# from which the job was submitted)
# start executable
You can submit the batch script using something like the following:
$ qsub -l nodes=1:ppn=40,walltime=05:00:00 script.csh
This will again request one complete node, this time for 5 hours (the allowed maximum is 24 hours). After job submission, qsub will print the job's ID number. When the job starts, the script script.csh gets executed one the allocated node.
After the job has finished, its stdout and stderr outputs will be saved in the directory where you had submitted it. Filenames for those files are usually compiled from the job name and ID, but can be modified using the -o and -e options to qsub (see manpage).
You can watch and control your jobs using the qstat and qdel commands, respectively.
- qstat will show you all your jobs, whether running (status `R') or queued (status `Q').
- qdel takes one or more job IDs (just the numbers) as arguments and allows you to remove a job from the queue, even when it's already running.
Clock frequency settings
If you want to get accurate timings in terms of processor cycles, you have to know the exact clock speed of the CPU. The Emmy processors have a nominal clock speed of 2.2 GHz, but "Turbo Mode" is enabled by default. This means that the CPU can "overclock" to some degree, depending on the number of active cores and the temperature. The highest possible clock speed is 3.0 GHz. In order to set the clock frequency to a specific (fixed) value you can specify a parameter at job submit time:
$ qsub -I -l nodes=1:ppn=40:f2.2,walltime=01:00:00 ...
In this example, the clock speed for all cores in this job would be set to 2.2 GHz. You can select from the following options: f2.2,f2.1,f2.0,f1.9,f1.8,f1.7,f1.6,f1.5,f1.4,f1.3,f1.2
The likwid tool suite has the command likwid-setFrequencies, which also allows you to set the clock speed from within the job (no special submit options required):
$ likwid-setFrequencies -f 2.2
Last modified: Tuesday, 7 April 2015, 9:24 PM