I couldn't tell whether you're working on Windows or Linux?
On Windows, I took an old serial program of mine (matrix multiply) and used OpenMP to parallelize it. I created a new CodeAnalyst project and used the thread profiling configuration to launch and collect a profile for the program. On a dual core Turion, the thread chart shows two threads with one thread scheduled to core 0 and the other thread scheduled to core 1.
Whether you're on Windows or Linux, the overall execution time of a properly parallelised program should be shorted than the single thread version of the program. (A lot hinges on that work "properly"!) In the case of the matrix multiplication program, the single threaded program runs in 17 seconds and the dual thread (OpenMP) program runs in 9.7 seconds.
Here's another experiment to try on either Windows or Linux. On Windows, I confgured for Time-Based Profiling. I collected profile data for both the single threaded and dual threaded (OpenMP) versions of the program. I used the "Separate CPUs" option in the view configuration dialog box (click "Manage" to get there) in order to separate the timer samples by CPU. For the single threaded program, I got the following timer sampes on core 0 and core 1:
amdk8.sys 11038 5640
matrix_interchange 5476 11043
Windows recheduled the matrix multiply program between core 0 and 1, which produced the uneven distribution of samples between the two cores. On Windows, amdk8.sys is the idle loop, so you can see that each core was idle part of the time.
For the dual thread program, I got the following distribution of timer samples between core 0 (column 1) and core 1 (column2):
matrix_omp 9199 9256
amdk8.sys 307 182
This is a pretty even split between the two cores since there were two threads that kept both cores busy. Further, the idle loop (amdk8.sys) didn't get very many timer samples at all!
I hope this helps you to troubleshoot your program.
Thx for your insight....
Im using Windows OS. I parallelised a for loop in my aplication with OpenMP on a four core PC and ran Code Analyst with thread profiling configuration. The report shows four threads assinged to a single core.... How cud it be possible? Is der something that im supposed to do with the code in order to assign the threads to different cores?
Moreover, i want to benchmark my findings on a two core PC and a four core PC. Unfortunately i've got a single 4-core PC. Can i somehow disable two cores on my PC, take the readings, and den enable all the four cores and take the readings again? Im using AMD Opteron 4-core PC.
Hi again --
I can't profess to be an OpenMP expert as I used a fairly straightforward pragma:
#pragma omp parallel for private (j,k)
to parallelize the execution of the outer loop of a simple matrix multiplication program with three nested loops. So, I thought maybe there are some project properties that need to be set. Under C/C++ Code Generation, the Runtime Library property is set to Multi-threaded DLL (/MD). Under C/C++ Language, the OpenMP Support property is set to Yes (/openmp). I have Common Language Runtime support turned off, so the compiler should be generating native code. I'm using Visual Studio 2005.
I saw your other post regarding number of cores and will reply.
Hope this helps -- pj
ill look into that