## Exercise: Measuring the divide throughput

We want to numerically integrate the function

f(x) = 4/(1+x

^{2})from 0 to 1. The result should be an approximation to π, of course. You may use a very simple rectangular integration scheme that works by summing up areas of rectangles centered around x

You can find (not quite complete) example programs in C and Fortran in the DIV folder. Make sure that your code actually computes an approximation to π, and report runtime and performance in MIterations/s as obtained on one core of the cluster.

_{i}with a width of Δx and a height of f(x_{i}):int SLICES = 100000000;

double delta_x = ....;

for (int i=0; i < SLICES; i++) {

x = (i+0.5)*delta_x;sum += (4.0 / (1.0 + x * x));

}Pi = sum * delta_x;

You can find (not quite complete) example programs in C and Fortran in the DIV folder. Make sure that your code actually computes an approximation to π, and report runtime and performance in MIterations/s as obtained on one core of the cluster.

- Assuming that the (non-pipelined) divide operation dominates the runtime of the code (any everything else is hidden behind the divides), can you estimate the latency for a divide in CPU cycles?
- Try compiling without the additional option
**-no-vec**(which disables SIMD vectorization). How does that change your result? What is your conclusion from this?

Last modified: Sunday, 22 March 2015, 10:25 PM