MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting, pinning

Prepare for these Exercises:

cp -a ~a2c06aa/he-hy .              #   copy the he-hy (Hello Hybrid!) directory

cd he-hy                                     #   change into your he-hy directory

Contents:                     #   README file - do NOT run as script... - .sh mainly for colors

job_*.sh                                      #   job-scripts to run the provided tools, 2 x job_*

*.[c|f90]                                       #   various codes (hello world & tests) - NO need to look into these!

1. FIRST THINGS FIRST - PART 1:   find out about a (new) cluster - login node

    module (avail, load, list, unload); compiler (name & --version)

    Always check compiler names and versions, e.g. with: mpiicc --version !!!

2. FIRST THINGS FIRST - PART 2:   find out about a (new) cluster - batch jobs

    job environment, job scripts (clean) & batch system (SLURM); test compiler and MPI version,  job_te-ve_[c|f],  te-ve*

                      SLURM (ivymuc):

                         sbatch job_*.sh                                        #   submit

                         squeue --clusters=ivymuc                         #   check

                         scancel --clusters=ivymuc JOB_ID           #   cancel

                         output will be written to: slurm-*.out        #   output

3. MPI-pure MPI:      compile and run the MPI "Hello world!" program (pinning)


    ? Why is the output (most of the time) unsorted ? ==> here (he-hy) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !

4. MPI+OpenMP: :TODO: how to compile and start an application
                                           how to do conditional compilation

    job_co-co_[c|f].sh,  co-co.[c|f90]

    Recap with Intel compiler & Intel MPI (→ see also slides 172-176):

C:      export OMP_NUM_THREADS=#
    with MPI mpiicc   -DUSE_MPI    -qopenmp             mpirun -n # ./<exe>
no MPI icc   -qopenmp   ./<exe>
Fortran: export OMP_NUM_THREADS=#
with MPI mpiifort   -fpp   -DUSE_MPI   -qopenmp   mpirun -n # ./<exe>
no MPI ifort  -fpp   -qopenmp  


    → Compile and Run (4 possibilities) co-co.[c|f90] = Demo for conditional compilation.

    → Have a look into the code:  co-co.[c|f90]  to see how it works.

    → It's also available as a script (needs editing):  job_co-co_[c|f].sh

5. MPI+OpenMP: :TODO: get to know the hardware - needed for pinning
                                                                              (→ see also slides 177-179)


    → Find out about the hardware of compute nodes:

    → Write and Submit:

    → Describe the compute nodes... (core numbering?)

    → solution =

    → solution.out =  ivymuc_slurm.out_check-hw_solution

6. MPI+OpenMP: :TODO: compile and run the Hybrid "Hello world!" program,  he-hy.[c|f90],  help_fortran_find_core_id.c


    → Run he-hy on a compute node, i.e.:  sbatch

    → Find out what's the default pinning with mpirun !

    → Look into: 

    → Do NOT YET do the pinning exercise, see below 7.

    ? Why is the output (most of the time) unsorted ? ==> here (he-hy) you can use: ... | sort -n

    ? Can you rely on the defaults for pinning ? ==> Always take care of correct pinning yourself !

7. MPI+OpenMP: :TODO: how to do pinning

    job_he-hy_[exercise|solution].sh,  he-hy.[c|f90]

      TODO (see below for info):

    → Do the pinning exercise in:

    → one possible solution =

PINNING: (→ see also slides 201-215)

Pinning depends on:

               batch system    [SLURM*] \
               MPI library [Intel*]  |   interaction between these !
               startup [mpirun*|srun]    /

Always check your pinning !

          → (he-hy.[c|f90] prints core_id)

          → print core_id in your application (see he-hy.*)

          → turn on debugging info & verbose output in job

          → monitor your job → login to nodes: top [1 q]

Intel → PINNING is done via environment variables (valid for Intel-only!):

pure MPI:            I_MPI_PIN_PROCESSOR_LIST=<proclist>   (other possiblities see Web)



                                                               omp - number of logical cores = OMP_NUM_THREADS

                            I_MPI_PIN_DOMAIN=[m_1,.....m_n] hexadecimal bit mask, [] included!

OpenMP:            KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]

                                    modifier                                             type (required)

                                    granularity=fine|thread|core|tile         compact
                                    proclist={<proc-list>}                         balanced
                                    [no]respect (an OS affinity mask)       scatter
                                    [no]verbose                                        explicit (no permute,offset)
                                    [no]warnings                                      disabled (no permute,offset)
                                                                                               none     (no permute,offset)

                                    default: noverbose,respect,granularity=core,none[,0,0]

Debug:               KMP_AFFINITY=verbose

Example:         1 MPI process per socket, 8 cores per socket, 2 sockets per node:

                         export OMP_NUM_THREADS=8

                         export KMP_AFFINITY=scatter

                         export I_MPI_PIN_DOMAIN=socket

                         mpirun -ppn 2 -np # ./<exe>

                                                           see: + slurm.out_he-hy_mpirun_test-1ps

Last modified: Sunday, 27 January 2019, 3:35 PM