Saturday, June 28, 2008

Parallel Computing with MPI - Part VI: Parallel Computing in a Nutshell

Finally, we reach the most interesting part of this series. In this post, I will present, in a nutshell, the parts that you need to know in order to get started with parallel computing. I will base this discussion on an article that I wrote in 2005 on the CFD-Wiki. You can find it here. Of course, a lot has changed since then, but the basics and semantics are exactely the same. Any improvement that will happen in parallel computing will be at the implementation level, hardware, and additional functionality. The base has been laid out nicely and so it will remain for a while.

This article will be a very long one, albeit I hate to write long articles on a blog. So if you would like me to split it, just drop me a comment or an email.

What is Parallel Computing?

Parallel computing may be defined as the simultaneous use of several processors to execute a computer program. In the process, inter-processor communication is required at some point in the code; therefore, the code has to be instructed to run in parallel

For example, a code that computes the sum of the first N integers is very easy to program in parallel. Each processor will be responsible for computing a fraction of the global sum. However, at the end of the execution, all processors have to communicate their local sums to a single processor for example that will add them all up to obtain the global sum.

Types of Parallel Computers

There two types of parallel architectures
  1. Shared memory multiprocessor
  2. Distributed memory multicomputer
In a shared memory multiprocessor, all processors share the same "memory addresses". Imagine that these processors are all on the same motherboard. Multi-core CPUs (dual cores, quad cores) and multi CPU socket motherboards, such as server motherboards (or the computers we have in our offices at UTSI) are examples of a shared memory multiprocessor machine. In this setup, all processors have direct access to any of the variables declared in a code. This configuration is very attractive from a programming point of view: there is no need to send data from one computer to another. However, the main disadvantage lies in possible conflicts in accessing/altering data that resides in memory (imagine processor one is computing the temperature at a point, while processor two tries to access that same value for possible computation - in this case, there will be a conflict and memory cannot be accessed). This is usually remedied by inserting function calls to make that memory addresses can be accessed without conflict. Of course, this outweighs the benefits of sharing memory addresses and therefore this configuration is no longer popular since conflict management becomes a tedious and ambiguous process.

In a distributed memory configuration, each processor has its own memory addresses and data has to be sent from one processor to another. Computers are usually connected via network switch and data flows over the network from one processor to another. Note, however, that multi-core CPUs as well as multi-socket motherboards CAN use this configuration. In this case, although the memory modules are the same for all processors, each one of them reserves a portion of this memory just as if it existed independently of the others. Also, when multi-core systems are used, data is sent much faster since it does not have to go through a network. The main advantage of this configuration is robustness. For example, new machines can always be added to the parallel computer. Office computers can be used as part of the parallel computer when they are idle, and if the GRAs do not work overnight!

So How Does it Work?

The history of parallel computing extends over a long period of time and may be traced to the early 60's (check this link for a concise historical perspective). However, the most important milestone took place in 1992 when a bunch of scientist (most likely computer guys) were assembled to create a standard for message passing; their job was to formulate a universal framework for exchanging data in a parallel computer. So they retreated for a while and came up with what is called today the Message Passing Interface or MPI. It defines a set of functions and rules on how data is to flow from one processor to another within a parallel computer. Of course, they do not have to do the actual implementation as this is left for third party vendors or the open source community.

So how does parallel computing work? The notion is very simple. A computer code runs on different processors. At one point or another, data has to be exchanged between processors. Therefore, a functional call has to be included at that point to instruct the code to send or receive data. Once the executable reaches that line of code, it allocates some memory on the CPU buffer, copies the data that is to be sent, and sends this data to the receiving computer. On the receiving end, the code is waiting for some data to arrive. Of course, it knows what it will be receiving, therefore it allocates enough space in the CPU buffer and waits for this data. Once the data arrives, it copies it to the designated address in memory. and that's all!

Of course, there are many other ways that this happens, and different implementations do things differently, but the scenario presented above is one viable method. I still have to include some references though.

Cite as:
Saad, T. "Parallel Computing with MPI - Part VI: Parallel Computing in a Nutshell". Weblog entry from Please Make A Note.


  1. Great stuff, very well written.
    You have made it almost understandable to me, unlike my professor.

  2. would you like to give me more explanation about parallel computing, please sent your explanation to thanx before Mr.

  3. Hi. I'm in great interest to setup a cluster of computer for running Fluent 6.3.26 in parallel and I found this link: but it's broken. could you please fix the broken link? Thanks!

  4. How to make Parallel Computing

  5. I was looking for this information, thanks for put in this easy way, I mean in a easy way to understand it jajaja, well until the next time.

  6. so in this way you make this, hmmm is totally different to the way that I learned in the college, but the final result is the same, like the life, different way, the same destination.