Preparing benchmark codes for analysis
Benchmarking is an important activity to learn more about the interaction of your code with the hardware. In order to do proper and reliable benchmarking you may consider the following guidelines:
- If possible, prepare your benchmark code in a serial and a parallel version. Profiling with simple tools is much easier when the code is sequential.
- Choose input data sets that are representative for the actual workload in production. It is a frequent mistake to do benchmarking with too small a working set. This leads to false conclusions because the hardware bottlenecks may be different for different problem sizes. Prepare benchmarking runs so that the actual run takes no more than one or two minutes. If you have some iterative algorithm, this can often be achieved by stopping after a few iterations (but enough so that the setup phase is insignificant).
- Prepare to do benchmarking runs with increasing number of threads or processes on a socket. I.e., we want to study how your code scales as we increase the number of cores until the socket is fully occupied.
- If you use Intel or GNU compilers, compile the serial code with the option "-pg" and run it to get a function profile. This profile is stored in the file "gmon.out" and can be displayed in human-readable form by the "gprof" command:
$ gprof a.out
Substitute "a.out" with the name of your binary. the gprof tool automatically reads toe profile data from gmon.out.
Last modified: Monday, 11 June 2018, 9:36 AM