Assignment 6: More OpenMP


  1. OpenMP fun. What is the problem here?

    double x,y,tmp;
    int i;

    #pragma omp parallel shared(x) private(tmp)

    {

    #pragma omp for reduction(+:x) nowait

    for(i=1; i< 100; i++){

    tmp=work1(i);
    x=x+tmp;

    }

    y = x;
    }/* end parallel*/


  2. Eliminating recurrence. Parallelize the second loop in the following code using OpenMP:

    #define N 10000000
    int main()

    {

    int i,b[N],c[N],d[N];

    for(i=0;i<N;i++)

    b[i]=c[i]=d[i]=0;

    // parallelize this loop

    for(i=1;i<N;i++) {

    b[i]=1+i;

    c[i]=b[i-1]+i;

    d[i]=c[i-1]+i;

    }

    }

    If you are unsure about the correctness of your code after parallelization, run it.


  3. Eliminating recurrence, reloaded. Parallelize the loop in the following piece of code using OpenMP:

    const double up = 1.00001 ;
    double Sn = 1.0;

    double opt[N+1];

    int n;

    for (n=0; n<=N; ++n) {

    opt[n] = Sn;

    Sn *= up;
    }

    The parallelized code should work independently of the OpenMP schedule used. Try to avoid - as far as possible - expensive operations that might harm serial performance. To solve this problem you might want to use the firstprivate and lastprivate OpenMP clauses. The former acts like private with the important difference that the value of the global variable is copied to the privatized instances. The latter has the effect that the listed variables' values are copied from the lexically last loop iteration to the global variable when the parallel loop exits.


  4. π by the Monte Carlo method. The quarter circle in the first quadrant with origin at (0,0) and radius 1 has an area of π/4. Look at the random number pairs in [0, 1] × [0, 1]. The probability that such a point lies inside the quarter circle is π/4, so given enough statistics we are able to calculate π using this so-called “Monte Carlo” method. Write a parallel OpenMP program that performs this task. Use the rand_r() function to get separate random number sequences for all threads. Make sure that adding more threads does not impede statistics. What is the best relative accuracy that you can achieve with ten "Emmy" cores in one second of walltime?