cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pszilard
Adept I

Issues when switching to OpenCL 2.0

I was going to try out some OpenCL 2.0 features an ran into two strange issues after adding the -cl-std=CL2.0 flag:

  • The preprocessing stage fails and the compiler complains about includes missing; the same code compiled just fine with CL1.2 did.
  • After copying the missing files to /tmp to satisfy the compiler, I got about 25-35% performance drop from just adding the flag not changing anything else. Is there something in the OpenCL 2.0 specs or is AMD implementation that (in)directly affects performance of 1.2 code? Or is this just a bug?
0 Likes
1 Solution
rampitec
Staff

Although OpenCL 1.2 and 2.0 syntax is very similar, there is a big internal difference for the compiler. The differences are both in language semantics and compiler internals. There are at least two big factors, which might affect kernel performance if 2.0 syntax is forced on an 1.2 source:

  1. Unqualified pointer passed to a function treated as generic vs private. That is a good idea to complete declarations of functions so that private pointer arguments are properly marked with __private attribute. That will make source compatible with both 1.2 and 2.0 syntax.
  2. OpenCL 2.0 supports non-uniform workgroups, so expansion of get_local_size() becomes substantially bigger than with 1.2. You can mitigate the impact by setting OpenCL 2.0 specific option -cl-uniform-work-group-size, although it will not remove all issues in the current release and will be improved in the future. Meanwhile you can achieve better results by using kernel attribute reqd_work_group_size if it is known.

Other than that compiler is really different internally for 1.2 and 2.0 now, so there can be differences if performance and behavior in both directions given a specific source.

View solution in original post

31 Replies