热度 18
2010-3-12 16:08
2649 次阅读|
0 个评论
Dear Readers, In the last article, we saw about the so called “embrassingly parallel operations” which can easily take advantage of multicore systems. In this article, let us see one more way of getting performance improvements on multicore systems. OpenMP is one of them. OpenMP specifications was originally defined by industry vendors like Sun, Intel in 1997. It was popular in Symmetric Multiprocessing (SMP) systems. A typical SMP system is a multiprocessor computer hardware where two or more identical processors are connected to a single shared memory and are all processors run same OS instance. Surprisingly, today's multicore systems are similar to the SMP architecture. Instead of multiple processors, we have multiple cores. All cores access the common shared memory and run same OS instance. That is why, a solution like OpenMP which is from SMP era is suddenly finding a renewed interest in the multicore systems of today. The OpenMP specification is defined for C/C++/Fortran languages. It consists of three parts: compiler directives, runtime library and environment variables . The code is instrumented with directives and it gets compiled with the openMP supported compiler. The code is linked with the runtime library for generating the executable. There are some runtime environment variables that control the code execution. An OpenMP program works like this: Start as a single process called the master thread . The master thread executes sequentially like any other normal program, until the first parallel region construct is encountered. The master thread then creates a team of parallel threads The statements in the parallel region construct are then executed in parallel among the various threads created When the individual threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread Since the process of creation, starting and joining of threads is done automatically, programmers are relieved of the complexities. The model also allows variables to be locked and shared between the threads and supports fairly advanced features. Here is an example code from Wikipedia: int main(int argc, char **argv) { const int N = 100000; int i, a ; #pragma omp parallel for - Compiler directive for (i = 0; i N; i++) a = 2 * i; return 0; } As first step, the code is compiled with OpenMP enabled compiler. An environment variable, something like OMP_NUM_THREADS is set to the number of threads and the program is executed. Suppose, OMP_NUM_THREADS is set to 4. Code starts normally, but when it reaches the for loop, it creates 4 threads, and each thread does the matrix multiplication for 100000/4=25000 different entries. This speeds up the processing as four threads work in parallel, on different cores. As we keep increasing the the value of OMP_NUM_THREADS, one could see a decrease in time and improvement in performance, till system bus bottlenecks start showing up. The advantages of the openMP include: Learning curve is low as it builds on existing languages through #pragma commands It hides thread semantics one could do incremental parallelization across the code and see the effects. This “Change and See” approach gives confidence to programmers It supports of good set platforms (C/C++/Fortran on Linux/Windows) supports both coarse/fine grained parallelism. Main disadvantage of OpenMP is that it needs specific tool chains (compilers, runtime). Not all compiler tool chains support OpenMP. Popular ones include Sun Studio tool chain and GNU 4.3.1. OpenMP can get a big performance improvement on a multicore systems as each thread can run on each core separately and hence translates to better performance. Then why OpenMP is not so well known in mainstream? It is because OpenMP gives big performance gains to mainly mathematical and scientific computing needs like large matrix multiplications. For a desktop application or server application, OpenMP may not be of great help unless the application logic has such code.