Wednesday, July 2, 2008

Parallel Computing with MPI - Part VII: Measuring Parallel Performance

The most obvious benefit of using parallel computing is the reduction in execution time of your code. For example, if your code takes two hours to run on a single processor, it would (theoretically) take one hour to run on two processors, 30 minutes to run on four processors, and 2/N hours to run on N processors. However, there are two delicate issues to address here.

First, you must distinguish between CPU time and wall clock time.

CPU vs Clock Time

CPU time is the time required by the central processing unit to process instructions. It does not involve input/output waiting times (for example, when the code is waiting for the user to input some data and the user is in taking a break).

On the other hand, wall clock time is the time taken to run a code as measured by a regular clock. This will include all kinds of input/output operations and any processor idle time. Also, if you are running 10 applications alongside your code, the wall clock time for your code will be obviously larger than if it was run only by itself.

CPU time is usually used to benchmark your code, but in reality, it is the wall clock time that really counts because if you start your code in the morning, and finishes by early evening, then that's when it really finished executing, not earlier as would most likely be reported by the CPU time.

I personally use both methods to time my parallel codes. Ideally, if your code is running on a dedicated parallel machine then the wall clock time will be somehow close to the CPU time.


The other issue I would like to address is the actual reduction in running time, usually measured by the speedup. The speedup is defined as

Theoretically, your speedup should be equal to N, i.e. if your codes runs in T seconds, then it takes it T/N seconds to run on N processors. Then, an ideal speedup would be a straight line at a 45 degrees angle
(Fig. 1)

The speedup curve may also give information on the scalability of your code. A code is scalable, in the parallel sense, when the speedup does not drift away from the ideal curve.

In practice, the speedup is not always ideal or even linear. The deterioration in scalability has several reasons that are beyond the scope of this article. A rule of thumb is to try to keep the load on each processor balanced. The more processors you want to use, the larger your problem data structures should be. You will learn this by practice.

Sometimes, you will see super-linear speedup, i.e. S(N) > N. This usually happens because the parallel code is sometimes more efficient that its sequential counterpart.


The efficiency of parallel code is a measure of how much time the processors are actually being used to execute the program. This may be written as

(Eq. 2)

When inserting Eq. 1 into Eq. 2, one arrives at

For instance, if the efficiency is 80%, then the processors are being used 80% of the time to do the actual computation and 20% of the time being idle.

In the next post, I will discuss the essential ingredients needed in the message passing paradigm. You will see that only a few functions are required to write a parallel code using message passing.

Cite as:
Saad, T. "Parallel Computing with MPI - Part VII: Measuring Parallel Performance". Weblog entry from Please Make A Note.

No comments:

Post a Comment