MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting, pinning

Prepare for these Exercises:

cd ~/HY-VSC/he-hy     #   change into your he-hy directory

Contents:           #   README file (very long --> follow the Moodle) - do NOT run as script... - .sh mainly for colors

job_*.sh                          #   job-scripts to run the provided tools, 2 x job_*

*.[c|f90]                          #   various codes (hello world & tests) - NO need to look into these!

vsc3/vsc3_slurm.out_*     #   vsc3 output files - sorted and with comments & debugging info

IN THE ONLINE COURSE this exercise shall be done in two parts:

    first exercise        =   1. + 2. + 4.          (skipping 3. for now)

    second exercise   =   5. + 3. + 6. + 7.   (after the talk on hardware and pinning)

    ! you have to check all hardware partitions you would like to use separately !

1. FIRST THINGS FIRST - PART 1:   find out about a (new) cluster - login node

    module (avail, load, list, unload); compiler (name & --version)

    Always check compiler names and versions, e.g. with: mpiicc --version !!!

2. FIRST THINGS FIRST - PART 2:   find out about a (new) cluster - batch jobs

    job environment, job scripts (clean) & batch system (SLURM); test compiler and MPI version,  job_te-ve_[c|f].sh,  te-ve*

SLURM (vsc3):

sbatch job*.sh                                      #   submit

sq                                                          #   check

scancel JOB_ID                                     #   cancel

output will be written to: slurm-*.out     #   output

3. --> skip this point in the first exercise (and come back later...)
3. MPI-pure MPI:      compile and run the MPI "Hello world!" program (pinning)


    ? Why is the output (most of the time) unsorted ? ==> here (he-mpi) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !

4. MPI+OpenMP: :TODO: how to compile and start an application
                                       how to do conditional compilation

    job_co-co_[c|f].sh,  co-co.[c|f90]

    Recap with Intel compiler & Intel MPI (→ see also slides #-#):

C:      export OMP_NUM_THREADS=#
    with MPI mpiicc   -DUSE_MPI    -qopenmp             mpirun -n # ./<exe>
no MPI icc   -qopenmp   ./<exe>
Fortran: export OMP_NUM_THREADS=#
with MPI mpiifort   -fpp   -DUSE_MPI   -qopenmp   mpirun -n # ./<exe>
no MPI ifort  -fpp   -qopenmp  


    → Compile and Run (4 possibilities) co-co.[c|f90] = Demo for conditional compilation.

    → Do it by hand - compile and run it directly on the login node.

    → Have a look into the code:  co-co.[c|f90]  to see how it works.

    → It's also available as a script:  job_co-co_[c|f].sh

STOP HERE ------- THIS IS THE END OF THE fist exercise ------- STOP HERE

5. MPI+OpenMP: :TODO: get to know the hardware - needed for pinning
                                                                              (→ see also slides #-#)


    → Find out about the hardware of compute nodes:

    → Write and Submit:

    → Describe the compute nodes... (core numbering?)

    → solution =

    → solution.out =  vsc3_slurm.out_check-hw_solution


    → Find out about the hardware of the vsc3plus nodes:

    → Write a and Submit it with sbvsc3plus (alias) to run on the new VSC3+ nodes.

    → Describe the vsc3plus nodes... (what are these, how many sockets, cores, hyperthreads, core numbering?)

    → no solution available

3. --> do 3. now --> MPI-pure MPI (pinning)

6. MPI+OpenMP: :TODO: compile and run the Hybrid "Hello world!" program,  he-hy.[c|f90],  help_fortran_find_core_id.c


    → Run he-hy on a compute node, i.e.:  sbatch

    → Find out what's the default pinning with mpirun !

    → Look into: 

    → Do NOT YET do the pinning exercise, see below 7.

    ? Why is the output (most of the time) unsorted ? ==> here (he-hy) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !

7. MPI+OpenMP: :TODO: how to do pinning

    job_he-hy_[exercise|solution].sh,  he-hy.[c|f90]

      TODO (see below for info):

    → Do the pinning exercise in:

    → one possible solution =

PINNING: (→ see also slides #-#)

Pinning depends on:

               batch system    [SLURM*] \
               MPI library [Intel*]  |   interaction between these !
               startup [mpirun*|srun]    /

Always check your pinning !

          → (he-hy.[c|f90] prints core_id)

          → print core_id in your application (see he-hy.*)

          → turn on debugging info & verbose output in job

          → monitor your job → login to nodes: top [1 q]

Intel → PINNING is done via environment variables (valid for Intel-only!):

pure MPI:           I_MPI_PIN_PROCESSOR_LIST=<proclist>   (other possiblities see Web)



                                                               omp - number of logical cores = OMP_NUM_THREADS

                            I_MPI_PIN_DOMAIN=[m_1,.....m_n] hexadecimal bit mask, [] included!

OpenMP:            KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]

                                    modifier                                             type (required)

                                    granularity=fine|thread|core|tile         compact
                                    proclist={<proc-list>}                         balanced
                                    [no]respect (an OS affinity mask)          scatter
                                    [no]verbose                                         explicit (no permute,offset)
                                    [no]warnings                                       disabled (no permute,offset)
                                                                                               none     (no permute,offset)

                                    default: noverbose,respect,granularity=core,none[,0,0]

Debug:               KMP_AFFINITY=verbose

Example:         1 MPI process per socket, 8 cores per socket, 2 sockets per node:

                         export OMP_NUM_THREADS=8

                         export KMP_AFFINITY=scatter

                         export I_MPI_PIN_DOMAIN=socket

                         mpirun -ppn 2 -np # ./<exe>

                                                           see: + slurm.out_he-hy_mpirun_test-1ps

Last modified: Sunday, 14 June 2020, 3:30 PM