sbike

Large openmp/bandwidth performance regression in newer open64 releases.

Discussion created by sbike on Nov 23, 2009
Latest reply on Mar 7, 2011 by nervi
OpenMP shows poor performance after 4.2.2.1

Source code for a popular openMP memory benchmark:

http://www.cs.virginia.edu/stream/FTP/Code/stream.c

Works great with:

GNU gcc version 4.2.0 (Open64 4.2.2.1 driver)

(I suggest increasing N by a factor of 10 to get more reliable timings despite cpuspeed and related variables).

$ opencc -O4 -m64 -mp stream.c -o stream && ./stream
...

Total memory required = 457.8 MB.
...

Number of Threads requested = 16

...
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       30546.3775       0.0110       0.0105       0.0145
Scale:      30583.9645       0.0107       0.0105       0.0122
Add:        29600.7575       0.0164       0.0162       0.0171
Triad:      29507.9135       0.0165       0.0163       0.0174

Now if I switch to 4.2.2.2:

export PATH=/share/apps/open64-4.2.2.2/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/open64-4.2.2.2/lib

$ opencc -V
x86 Open64 Compiler Suite: Version 4.2.2.2
...
opencc -O4 -m64 -mp stream.c -o stream && ./stream

...

Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       11630.9547       0.0277       0.0275       0.0278
Scale:      11618.5706       0.0278       0.0275       0.0283
Add:        13120.5255       0.0367       0.0366       0.0368
Triad:      12992.2490       0.0370       0.0369       0.0371

I also tried the newest 4.2.3 beta:

$ export PATH=/share/apps/open64-4.2.3/bin:$PATH
$ export LD_LIBRARY_PATH=/share/apps/open64-4.2.3/lib
$ opencc -V
Open64 Compiler Suite: Version 4.2.2.99
$ opencc -O4 -m64 -mp stream.c -o stream && ./stream
...

Number of Threads requested = 16
...

Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       11585.7750       0.0279       0.0276       0.0284
Scale:      11610.6305       0.0279       0.0276       0.0284
Add:        13131.6509       0.0367       0.0366       0.0369
Triad:      12981.7771       0.0372       0.0370       0.0381

I seem to recall some intel specific library that was mistakenly left out of 4.2.2.2, but was promised to be included again for the next release.  Maybe that was forgotten?

 

 

 

 

Outcomes