First, you must distinguish between CPU time and wall clock time.
CPU vs Clock Time
CPU time is usually used to benchmark your code, but in reality, it is the wall clock time that really counts because if you start your code in the morning, and finishes by early evening, then that's when it really finished executing, not earlier as would most likely be reported by the CPU time.
I personally use both methods to time my parallel codes. Ideally, if your code is running on a dedicated parallel machine then the wall clock time will be somehow close to the CPU time.
The other issue I would like to address is the actual reduction in running time, usually measured by the speedup. The speedup is defined as
Theoretically, your speedup should be equal to N, i.e. if your codes runs in T seconds, then it takes it T/N seconds to run on N processors. Then, an ideal speedup would be a straight line at a 45 degrees angle
In practice, the speedup is not always ideal or even linear. The deterioration in scalability has several reasons that are beyond the scope of this article. A rule of thumb is to try to keep the load on each processor balanced. The more processors you want to use, the larger your problem data structures should be. You will learn this by practice.
Sometimes, you will see super-linear speedup, i.e. S(N) > N. This usually happens because the parallel code is sometimes more efficient that its sequential counterpart.
The efficiency of parallel code is a measure of how much time the processors are actually being used to execute the program. This may be written as
For instance, if the efficiency is 80%, then the processors are being used 80% of the time to do the actual computation and 20% of the time being idle.
In the next post, I will discuss the essential ingredients needed in the message passing paradigm. You will see that only a few functions are required to write a parallel code using message passing.