cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pszilard
Adept I

Issues when switching to OpenCL 2.0

I was going to try out some OpenCL 2.0 features an ran into two strange issues after adding the -cl-std=CL2.0 flag:

  • The preprocessing stage fails and the compiler complains about includes missing; the same code compiled just fine with CL1.2 did.
  • After copying the missing files to /tmp to satisfy the compiler, I got about 25-35% performance drop from just adding the flag not changing anything else. Is there something in the OpenCL 2.0 specs or is AMD implementation that (in)directly affects performance of 1.2 code? Or is this just a bug?
0 Likes
1 Solution
rampitec
Staff

Although OpenCL 1.2 and 2.0 syntax is very similar, there is a big internal difference for the compiler. The differences are both in language semantics and compiler internals. There are at least two big factors, which might affect kernel performance if 2.0 syntax is forced on an 1.2 source:

  1. Unqualified pointer passed to a function treated as generic vs private. That is a good idea to complete declarations of functions so that private pointer arguments are properly marked with __private attribute. That will make source compatible with both 1.2 and 2.0 syntax.
  2. OpenCL 2.0 supports non-uniform workgroups, so expansion of get_local_size() becomes substantially bigger than with 1.2. You can mitigate the impact by setting OpenCL 2.0 specific option -cl-uniform-work-group-size, although it will not remove all issues in the current release and will be improved in the future. Meanwhile you can achieve better results by using kernel attribute reqd_work_group_size if it is known.

Other than that compiler is really different internally for 1.2 and 2.0 now, so there can be differences if performance and behavior in both directions given a specific source.

View solution in original post

31 Replies

I've added you to the developer forum white list and moved this message to the OpenCL forum.

0 Likes
nibal
Challenger

Hi,

I imagine you have SDK-3.0 installed. What is the output of your clinfo, the version of your driver, your OS and your Makefile include flags?

TIA,

Nikos

0 Likes

Yes, I have the SDK 3.0 installed*, the clinfo output is this: Paste: h444t

(One a related note, can anyone explain why does this version have nothing to do with the fglrx version which in turn has nothing to do with the year.month versioning of the releases?)

  • OS: Ubuntu 14.04.3, kernel 3.19

Which "makefile include flags" are you referring to?

*admittedly there are some funny issues with it, my application reports different driver version when using the APP SDK v2.9 and 3.0 (i.e. headers + linked against libOpenCL).

0 Likes

Well for your SDK header files you need to include in your CFLAGS: -I/opt/AMDAPPSDK-3.0/include. Have you done it?

BTW, driver version is different under SDK2.9 vs SDK3.0. This is an SDK2.9 known bug, but it is no longer supported

Which compiler do you mean, gcc or ocl? What preprocessor errors you get?

The drop in performance is a known bug:(

BTW, b4 you get in too deep with ocl, be warned Memory corruption in latest crimson driver 15.302?

HTH

Nikos

0 Likes

> Well for your SDK header files you need to include in your CFLAGS: -I/opt/AMDAPPSDK-3.0/include. Have you done it?

Sure, my code builds (http://www.gromacs.org/ ​), so yeah, includes should be fine.

> BTW, driver version is different under SDK2.9 vs SDK3.0. This is an SDK2.9 known bug,

You mean it's a known bug that the driver version reported via the OpenCL SDK when using v2.9 and v3.0 are different?

> but it is no longer supported

What is not supported?

> Which compiler do you mean, gcc or ocl? What preprocessor errors you get?

The OpenCL compiler. It seems like the include path where some headers used by the OpenCL kernel are located is simply ignored.

0 Likes

pszilard wrote:

> Well for your SDK header files you need to include in your CFLAGS: -I/opt/AMDAPPSDK-3.0/include. Have you done it?

Sure, my code builds (http://www.gromacs.org/ ), so yeah, includes should be fine.

So, no more /tmp/headers...

pszilard wrote:

> BTW, driver version is different under SDK2.9 vs SDK3.0. This is an SDK2.9 known bug,

You mean it's a known bug that the driver version reported via the OpenCL SDK when using v2.9 and v3.0 are different?    

Yes. You can double-check with clinfo. Version from-SDK 2.9.1 should be 700-1000 lower than SDK-3.0

pszilard wrote:

> but it is no longer supported

What is not supported?

SDK-2.9.1 ofc. I don't think that even SDK-3.0 is supported any more

pszilard wrote:

> Which compiler do you mean, gcc or ocl? What preprocessor errors you get?

The OpenCL compiler. It seems like the include path where some headers used by the OpenCL kernel are located is simply ignored.

Hmmm. Interesting. I can't use ocl2.0, so, i never tried compiling as 2.0. But never had a problem with ocl1.2 and my own headers.

Have you tried passing in your cflags string "-I."?

HTH,

Nikos

0 Likes

> So, no more /tmp/headers...

False. The code builds with -cl-std=CL1.2. If I pass -cl-std=CL2.0, includes are not found. Not OpenCL includes, but my own includes that should be picked up from the path passed to the OpenCL compiler via "-I/path/to/my/source/tree"

Yes. You can double-check with clinfo. Version from-SDK 2.9.1 should be 700-1000 lower than SDK-3.0

Not really. With APP SDK 3.0 I get lower driver version reported, 1800.8 here:

$ ldd $gmx | grep OpenCL;

libOpenCL.so.1 => /opt/tcbsys/amd/appsdk/3.0/lib/x86_64/libOpenCL.so.1 (0x00007fd37562a000)

$ $gmx mdrun -nsteps 0 2>&1 | grep -A2 "GPUs detected"

    Number of GPUs detected: 2

    #0: name: Fiji, vendor: Advanced Micro Devices, Inc., device version: OpenCL 2.0 AMD-APP (1800.8), stat: compatible

    #1: name: Hawaii, vendor: Advanced Micro Devices, Inc., device version: OpenCL 2.0 AMD-APP (1800.8), stat: compatible

while with v2.9 I get 1912.5

$ ldd $gmx | grep OpenCL

  libOpenCL.so.1 => /opt/tcbsys/amd/appsdk/2.9/lib/x86_64/libOpenCL.so.1 (0x00007f43c7ce5000)

$ $gmx mdrun -nsteps 0 2>&1 | grep -A2 "GPUs detected"

    Number of GPUs detected: 2

    #0: name: Fiji, vendor: Advanced Micro Devices, Inc., device version: OpenCL 2.0 AMD-APP (1912.5), stat: compatible

    #1: name: Hawaii, vendor: Advanced Micro Devices, Inc., device version: OpenCL 2.0 AMD-APP (1912.5), stat: compatible

0 Likes

PS: What do you mean by APP SDK 3.0 not being supported? Supported by what, the diver? Or by AMD?

0 Likes

pszilard wrote:

PS: What do you mean by APP SDK 3.0 not being supported? Supported by what, the diver? Or by AMD?

The driver always includes all ocl libraries. I doubt about AMD. Haven't seen *any* AMD employee or moderator since I came back.

I doubt that anyone from AMD is reading this forum:(

0 Likes

Oh, that means I should lower my expectations... Thanks anyway.

0 Likes

pszilard wrote:

Oh, that means I should lower my expectations... Thanks anyway.

No, You should lower your expectations because of that Memory corruption in latest crimson driver 15.302?

0 Likes

I don't know what the driver version you refer to is. For the platforms I care about 15.12 the most recent. Also, one broken driver version is bad, but not the end of the world. However, compiling CL1.2 code with the CL2.0 compiler and getting 30% performance drop is a can be a pretty big deal - especially if others have noticed similar regressions.

0 Likes

pszilard wrote:

I don't know what the driver version you refer to is. For the platforms I care about 15.12 the most recent. Also, one broken driver version is bad, but not the end of the world. However, compiling CL1.2 code with the CL2.0 compiler and getting 30% performance drop is a can be a pretty big deal - especially if others have noticed similar regressions.

Hmmm, you apparently didn't read the thread. I've found the same corruption back to every driver i could test, Catalyst 15.5. It's propably since the beginning:( You should upgrade to latest Crimson 15.302 (they have stopped numbering them like catalysts)

0 Likes

> Hmmm, you apparently didn't read the thread. I've found the same corruption back to every driver i could test, Catalyst 15.5. It's propably since the beginning:( You should upgrade to latest Crimson 15.302 (they have stopped numbering them like catalysts)

I still don't know where to get 15.302, nor whether it's new or old driver. In any case, I am seeing crashes, so I'm fine for now.

0 Likes

>> So, no more /tmp/headers...

> False. The code builds with -cl-std=CL1.2. If I pass -cl-std=CL2.0, includes are not found. Not OpenCL includes, but my own includes that should be picked up from

> the path passed to the OpenCL compiler via "-I/path/to/my/source/tree"

Oops. thought you were talking about gcc ocl headers...Have you tried passing "-I."?

>>Yes. You can double-check with clinfo. Version from-SDK 2.9.1 should be 700-1000 lower than SDK-3.0
> Not really. With APP SDK 3.0 I get lower driver version reported, 1800.8 here:

Not really. You are using mixed libraries and env. Driver libraries are under /usr/lib and are more uptodate than the SDK's

Having 2 SDK's active at the same time is a bad idea. Compounds on the confused libraries and env:(

You should clean your libraries and env, starting from /etc/profile.d/AMDSDK.sh. Comment out the library path in there.

BR,

Nikos

0 Likes

> Not really. You are using mixed libraries and env. Driver libraries are under /usr/lib and are more uptodate than the SDK's

> Having 2 SDK's active at the same time is a bad idea. Compounds on the confused libraries and env:(

They are not "active" - whatever that means. I don't have the APPSDK_ROOT/lib/x86_64 in my LD_LIBRARY_PATH, my binaries are compiled with RPATH, so there is nothing to mix here unless the libOpenCL.so loader somehow gets confused and loads APPSDK_ROOT/lib/x86_64/libamdocl64.so instead of the one installed by the driver in /usr/lib.

I can perhaps try to put a full path in the icd loader file.

0 Likes

pszilard wrote:

> Not really. You are using mixed libraries and env. Driver libraries are under /usr/lib and are more uptodate than the SDK's

> Having 2 SDK's active at the same time is a bad idea. Compounds on the confused libraries and env:(

They are not "active" - whatever that means. I don't have the APPSDK_ROOT/lib/x86_64 in my LD_LIBRARY_PATH, my binaries are compiled with RPATH, so there is nothing to mix here unless the libOpenCL.so loader somehow gets confused and loads APPSDK_ROOT/lib/x86_64/libamdocl64.so instead of the one installed by the driver in /usr/lib.

I can perhaps try to put a full path in the icd loader file.

Loader in Unix has to take the path from smw. Either your LD_LIBRARY_PATH or your /etc/ld.so.conf have it smw:)

0 Likes

...or from the RPATH which is in the binary

$ objdump -x bin/gmx | grep RPATH

  RPATH                /opt/tcbsys/amd/appsdk/3.0/lib/x86_64::::::::::::::::::::::::::::::::

0 Likes
pszilard
Adept I

Can anybody from AMD comment on this? One of our users has just reported the first issue on our mailing list. Is the include processing/path broken in the OpenCL 2.0 compiler?

0 Likes

Hi,

"One of our users..." You mean yourself?

BTW yours is not the first issue in our mailing list. I have been longer and have hundreds. Ocl has been around for at least 5 years now

I have good news. I messaged a moderator with subject "OpenCL abandoned?" and she just responded me that she forwarded it to AMD ocl team. I expect to hear an official AMD response shortly;-) I will update this list.

BR,

Nikos

0 Likes

> You mean yourself?

No. As I hinted before (perhaps next time I should be more loud and shouting about it?), I'm dev of the GROMACS molecular simulation package. I was referring to this report: [gmx-users] OpenCL compile error

> I have good news.

Great. Trying to cheer up, but the number of and especially kind of issues I'm running into is borderline disheartening.

Anyway, thanks for the update (offer).

0 Likes

> One of our users has just reported the first issue on our mailing list

And there I thought, that "our mailing list" was this list, and you just had your first issue:)

>Great. Trying to cheer up, but the number of and especially kind of issues I'm running into is borderline disheartening.

I can only repeat what Veevee told me. "Just keep posting!"

BR,

Nikos

0 Likes

> I can only repeat what Veevee told me. "Just keep posting!"

Honestly, I'd much more appreciate a proper bugtracker where I know that a report will be looked at by those who can fix it. It feels like a waste of time to mess around on a forum - there aren't even bug report templates.

0 Likes

pszilard wrote:

> I can only repeat what Veevee told me. "Just keep posting!"

Honestly, I'd much more appreciate a proper bugtracker where I know that a report will be looked at by those who can fix it. It feels like a waste of time to mess around on a forum - there aren't even bug report templates.

Yeap, me too. But the way it works here, is that an AMD guru has to screen all theses issues, and send only the real bugs to the back office. The rest are solved here in the forum:)

It usually is much faster for the majority of them:)

Nikos

0 Likes
pszilard
Adept I

> The preprocessing stage fails and the compiler complains about includes missing; the same code compiled just fine with CL1.2 did.

Having asked/looked around, apparently this is a relatively common and recurring error in several compilers. Not sure what's so hard to get right in handling the "-I" directive, but in any case, I'd very much like to have this fixed, I'm generating code using the preprocessor and I don't see any option to work around the issue.

I know that I could pass multiple files to the compilation to have them "merged", but as I include the same file multiple times, this won't work for sure.

0 Likes
rampitec
Staff

Although OpenCL 1.2 and 2.0 syntax is very similar, there is a big internal difference for the compiler. The differences are both in language semantics and compiler internals. There are at least two big factors, which might affect kernel performance if 2.0 syntax is forced on an 1.2 source:

  1. Unqualified pointer passed to a function treated as generic vs private. That is a good idea to complete declarations of functions so that private pointer arguments are properly marked with __private attribute. That will make source compatible with both 1.2 and 2.0 syntax.
  2. OpenCL 2.0 supports non-uniform workgroups, so expansion of get_local_size() becomes substantially bigger than with 1.2. You can mitigate the impact by setting OpenCL 2.0 specific option -cl-uniform-work-group-size, although it will not remove all issues in the current release and will be improved in the future. Meanwhile you can achieve better results by using kernel attribute reqd_work_group_size if it is known.

Other than that compiler is really different internally for 1.2 and 2.0 now, so there can be differences if performance and behavior in both directions given a specific source.

Thanks for the explanation. Not sure what does it mean that this answer is "correct" - is that some sort of peer review? Or is it a way to mark a forum thread something done/closed?

> Unqualified pointer passed to a function treated as generic vs private.

Not the case.

> OpenCL 2.0 supports non-uniform workgroups, so expansion of get_local_size() becomes substantially bigger than with 1.2. You can mitigate the impact by setting OpenCL 2.0 specific option -cl-uniform-work-group-size, although it will not remove all issues in the current release and will be improved in the future.

It does not help.

> Meanwhile you can achieve better results by using kernel attribute reqd_work_group_size if it is known.

It's always set.

I'm honestly *very* concerned about the state of the AMD OpenCL stack. Believe me, I get that it's hard to develop compilers and tools for GPUs and this isn't made easier by having a standard to rely on. However, people have gotten used to continous and reliable improvements brought by software (and hardware up until recently). Your competitors do OK with delivering on that promise and if users can't get at least an assurance that their performance does not fall off of a cliff with a new version of the GPU compiler, they may just have one more good reason to not bother with AMD GPUs and OpenCL.

0 Likes

This is the generic answer to a generic question. I'm sorry it did not help you. If you wish and do not mind sharing compilation dumps I could try to understand what was the reason for performance difference in your case.

0 Likes

Thanks for getting back and sorry for the delayed reply. I would be interested, but I'm rather busy ATM. What exactly would it mean "sharing compilation dumps"?

Note that the code I work on is an OSS large scientific code, available here: Downloads - Gromacs

0 Likes

If I'm right, Stanislav Mekhanoshin has asked for the intermediate temporary files (such as IL and ISA code) that can be generated during kernel building/compilation. You can dump/save those temporary files using build option "-save-temps[=<prefix>]"  . For more details, please refer the section "3.4 AMD-Developed Supplemental Compiler Options" in AMD programming user guide.

Regards

0 Likes

Right, dumps made with -save-temps=<prefix> is what is needed for the analysis. The easiest way to get them is by setting env AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps=dump before the app run.

AMD_OCL_BUILD_OPTIONS_APPEND

0 Likes