Dear Readers,
In the last article, we saw about the so called “embrassingly parallel operations” which can easily take advantage of multicore systems. In this article, let us see one more way of getting performance improvements on multicore systems. OpenMP is one of them.
OpenMP specifications was originally defined by industry vendors like Sun, Intel in 1997. It was popular in Symmetric Multiprocessing (SMP) systems. A typical SMP system is a multiprocessor computer hardware where two or more identical processors are connected to a single shared memory and are all processors run same OS instance.
Surprisingly, today's multicore systems are similar to the SMP architecture. Instead of multiple processors, we have multiple cores. All cores access the common shared memory and run same OS instance. That is why, a solution like OpenMP which is from SMP era is suddenly finding a renewed interest in the multicore systems of today.
The OpenMP specification is defined for C/C++/Fortran languages. It consists of three parts: compiler directives, runtime library and environment variables. The code is instrumented with directives and it gets compiled with the openMP supported compiler. The code is linked with the runtime library for generating the executable. There are some runtime environment variables that control the code execution.
An OpenMP program works like this:
Since the process of creation, starting and joining of threads is done automatically, programmers are relieved of the complexities. The model also allows variables to be locked and shared between the threads and supports fairly advanced features.
Here is an example code from Wikipedia:
int main(int argc, char **argv) {
const int N = 100000;
int i, a[N];
#pragma omp parallel for - Compiler directive
for (i = 0; i < N; i++)
a = 2 * i;
return 0;
}
As first step, the code is compiled with OpenMP enabled compiler. An environment variable, something like OMP_NUM_THREADS is set to the number of threads and the program is executed. Suppose, OMP_NUM_THREADS is set to 4. Code starts normally, but when it reaches the for loop, it creates 4 threads, and each thread does the matrix multiplication for 100000/4=25000 different entries. This speeds up the processing as four threads work in parallel, on different cores.
As we keep increasing the the value of OMP_NUM_THREADS, one could see a decrease in time and improvement in performance, till system bus bottlenecks start showing up.
The advantages of the openMP include:
Main disadvantage of OpenMP is that it needs specific tool chains (compilers, runtime). Not all compiler tool chains support OpenMP. Popular ones include Sun Studio tool chain and GNU 4.3.1.
OpenMP can get a big performance improvement on a multicore systems as each thread can run on each core separately and hence translates to better performance.
Then why OpenMP is not so well known in mainstream? It is because OpenMP gives big performance gains to mainly mathematical and scientific computing needs like large matrix multiplications. For a desktop application or server application, OpenMP may not be of great help unless the application logic has such code.
文章评论(0条评论)
登录后参与讨论