MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting, pinning


Prepare for these Exercises:

cp -a ~a2c06aa/he-hy .              #   copy the he-hy (Hello Hybrid!) directory

cd he-hy                                     #   change into your he-hy directory


Contents:

README_he-hy.sh                     #   README file - do NOT run as script... - .sh mainly for colors

job_*.sh                                      #   job-scripts to run the provided tools, 2 x job_*_exercise.sh

*.[c|f90]                                       #   various codes (hello world & tests) - NO need to look into these!



1. FIRST THINGS FIRST - PART 1:   find out about a (new) cluster - login node

    module (avail, load, list, unload); compiler (name & --version)

    Always check compiler names and versions, e.g. with: mpiicc --version !!!


2. FIRST THINGS FIRST - PART 2:   find out about a (new) cluster - batch jobs

    job environment, job scripts (clean) & batch system (SLURM); test compiler and MPI version

    job_env.sh,  job_te-ve_[c|f],  te-ve*


                      SLURM (ivymuc):


                         sbatch job_*.sh                                        #   submit

                         squeue --clusters=ivymuc                         #   check

                         scancel --clusters=ivymuc JOB_ID           #   cancel

                         output will be written to: slurm-*.out        #   output


3. MPI-pure MPI:      compile and run the MPI "Hello world!" program (pinning)

    job_he-mpi_[default|ordered].shhe-mpi.[c|f90]help_fortran_find_core_id.c

    ? Why is the output (most of the time) unsorted ? ==> here (he-hy) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !


4. MPI+OpenMP: :TODO: how to compile and start an application
                                           how to do conditional compilation

    job_co-co_[c|f].sh,  co-co.[c|f90]

    Recap with Intel compiler & Intel MPI (→ see also slides 172-176):

compiler:   ? USE_MPI  ? _OPENMP START APPLICATION:
C:      export OMP_NUM_THREADS=#
    with MPI mpiicc   -DUSE_MPI    -qopenmp             mpirun -n # ./<exe>
no MPI icc   -qopenmp   ./<exe>
Fortran: export OMP_NUM_THREADS=#
with MPI mpiifort   -fpp   -DUSE_MPI   -qopenmp   mpirun -n # ./<exe>
no MPI ifort  -fpp   -qopenmp  
./<exe>


      TODO:

    → Compile and Run (4 possibilities) co-co.[c|f90] = Demo for conditional compilation.

    → Have a look into the code:  co-co.[c|f90]  to see how it works.

    → It's also available as a script (needs editing):  job_co-co_[c|f].sh


5. MPI+OpenMP: :TODO: get to know the hardware - needed for pinning
                                                                              (→ see also slides 177-179)

      TODO:

    → Find out about the hardware of compute nodes:

    → Write and Submit: job_check-hw_exercise.sh

    → Describe the compute nodes... (core numbering?)

    → solution = job_check-hw_solution.sh

    → solution.out =  ivymuc_slurm.out_check-hw_solution


6. MPI+OpenMP: :TODO: compile and run the Hybrid "Hello world!" program

    job_he-hy_exercise.sh,  he-hy.[c|f90],  help_fortran_find_core_id.c

      TODO:

    → Run he-hy on a compute node, i.e.:  sbatch job_he-hy_exercise.sh

    → Find out what's the default pinning with mpirun !

    → Look into: job_he-hy_exercise.sh 

    → Do NOT YET do the pinning exercise, see below 7.


    ? Why is the output (most of the time) unsorted ? ==> here (he-hy) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !


7. MPI+OpenMP: :TODO: how to do pinning

    job_he-hy_[exercise|solution].sh,  he-hy.[c|f90]


      TODO (see below for info):

    → Do the pinning exercise in:  job_he-hy_exercise.sh

    → one possible solution = job_he-hy_solution.sh


PINNING: (→ see also slides 201-215)


Pinning depends on:

               batch system    [SLURM*] \
               MPI library [Intel*]  |   interaction between these !
               startup [mpirun*|srun]    /

Always check your pinning !

          → job_he-hy...sh (he-hy.[c|f90] prints core_id)

          → print core_id in your application (see he-hy.*)

          → turn on debugging info & verbose output in job

          → monitor your job → login to nodes: top [1 q]


Intel → PINNING is done via environment variables (valid for Intel-only!):


pure MPI:            I_MPI_PIN_PROCESSOR_LIST=<proclist>   (other possiblities see Web)


MPI+OpenMP:   I_MPI_PIN_DOMAIN (3 options) + KMP_AFFINITY

                            I_MPI_PIN_DOMAIN=core|socket|numa|node|cache|...

                            I_MPI_PIN_DOMAIN=omp|<n>|auto[:compact|scatter|platform]
                                                               omp - number of logical cores = OMP_NUM_THREADS

                            I_MPI_PIN_DOMAIN=[m_1,.....m_n] hexadecimal bit mask, [] included!


OpenMP:            KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]

                                    modifier                                             type (required)

                                    granularity=fine|thread|core|tile         compact
                                    proclist={<proc-list>}                         balanced
                                    [no]respect (an OS affinity mask)       scatter
                                    [no]verbose                                        explicit (no permute,offset)
                                    [no]warnings                                      disabled (no permute,offset)
                                                                                               none     (no permute,offset)

                                    default: noverbose,respect,granularity=core,none[,0,0]


Debug:               KMP_AFFINITY=verbose
                           I_MPI_DEBUG=4


Example:         1 MPI process per socket, 8 cores per socket, 2 sockets per node:

                         export OMP_NUM_THREADS=8

                         export KMP_AFFINITY=scatter

                         export I_MPI_PIN_DOMAIN=socket

                         mpirun -ppn 2 -np # ./<exe>

                                                           see:  job_he-hy_test-1ps.sh + slurm.out_he-hy_mpirun_test-1ps


Last modified: Sunday, 27 January 2019, 3:35 PM