Code and parallelize (using OpenMP) the standard vector triad benchmark from the lecture. To do this, compile and link with the -openmp switch and use the "fused" parallel for/do directives in the following way:

 Fortran C do r=1,NITER!$OMP PARALLEL DOXXdo i=1,NXXXXa(i) = b(i) + c(i) * d(i)XXenddo!$OMP END PARALLEL DOenddo for(r=0; r

To determine the number of processors (threads), set the OMP_NUM_THREADS environment variable to the desired number prior to starting your executable, e.g.:

$env OMP_NUM_THREADS=10 ./a.out 1. Perform benchmark runs with the sequential triad code on one Emmy core. Make sure that the actual benchmark loop is repeated often enough (i.e., set NITER appropriately) to get proper measurements (you may consult the skeleton code in the SCAN/ folder). Draw a performance graph (or make a table) for N = 101...10(use log scaling on the x axis) and make sure the result is similar to what you expect. 2. Now fix the clock speed to 2.2 GHz. Can you interpret the observable changes to the data? 3. Repeat the experiment for 1...20 threads. Use likwid-pin to control the placement of threads. Example:$ module load likwid