Some typical exam questions

In the following we list some typical questions you may encounter during the exam. 

  • Name three ways to increase the performance of a CPU chip (technical/architectural)
  • Name the two principal ways do design a shared-memory parallel computer. What are advantages and disadvantages of both options? What is the dominant design today?
  • Someone says "my 3D Jacobi stencil solver achieves a performance of 12 GFlop/s on CPU X, which is the highest performance reported to date on this machine". What is your comment on that?
  • Someone says "My shared-memory parallel code shows a speedup of 9.5 on 10 cores of processor X, so it is truly scalable". What is your comment on that?
  • An OpenMP-parallel code you have written does not scale across the cores of a multicore chip (such as the Ivy Bridge CPU in the "Emmy" cluster). Name possible reasons for that.
  • Sketch the typical performance behavior vs. problem size on one modern CPU core for the loop A(:)=A(:)*B(:). What is the expected in-memory saturated performance on a CPU chip with a memory bandwidth of 48 GB/s? What is the typical scaling behavior across threads when the data is in memory?
  • Consider the following code:

    for(k=0; k<N; ++k) {
      for(j=0; j<N; ++j) {
        for(i=0; i<N; ++i) {
          y[i][j][k] = 0.25*(x[i-1][j][k] + x[i-2][j][k] + x[i+1][j-1][k+1] +
                                  x[i][j+1][k-1]) + b[i][j][k];
        }
      }
    }

    What simple code optimization would you suggest? Parallelize the code with OpenMP. What is the highest performance you can expect on a CPU with 64 GB/s of main memory bandwidth, 10 cores, and 25 MB of outer-level cache if the problem size is N=512? Will yoou get  this performance with the (parallelized) code as shown? What do you have to do to get it?
  • Comment on the statement "The performance of my sequential code is totally limited by main memory bandwidth". Can this be true? What experiment would you suggest to refute this claim?
  • You have written a code that is able to run on 1000s of CPU cores, and you want to present its scalability graphically. How would you do that?
Last modified: Thursday, 14 April 2016, 11:16 AM