cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ntrolls
Journeyman III

Local work group size on HD 4850

Hi, is there anyone who is running OpenCL on HD4850? I cannot use a local work group size larger than 64, regardless of what my kernel is, whereas the device tells me the maximum size of work group is 1024. What am I missing?

Tags (1)
0 Likes
12 Replies
MicahVillmow
Staff
Staff

Local work group size on HD 4850

Are you using a barrier?
0 Likes
ntrolls
Journeyman III

Local work group size on HD 4850

Yes. Local memory fence for tiled matrix multiplication. Would that be why?

0 Likes
hazeman
Adept II

Local work group size on HD 4850

Originally posted by: ntrolls Hi, is there anyone who is running OpenCL on HD4850? I cannot use a local work group size larger than 64, regardless of what my kernel is, whereas the device tells me the maximum size of work group is 1024. What am I missing?

 

With v2.0 group size is limited to 64 on 4xxx cards ( some problems with barrier on RV7xx ). Generaly OpenCL for 4xxx series is more on the lines "it works enough to be advertised, but forget about using it for any resonable computations".

 

0 Likes
MicahVillmow
Staff
Staff

Local work group size on HD 4850

Yes, the barrier on the 4XXX series is a software barrier which can cause problems in corner cases. If you want to work around it, please use __attribute__((reqd_work_group_size(X, Y, Z))) on your kernel and we will compile for exactly that group size.

0 Likes
ntrolls
Journeyman III

Local work group size on HD 4850

Thanks a million - you're the first person who shed a real light on this! I never would have guessed such a thing..

I'm running this on Snow Leopard 10.6.2 with a Java wrapper. I added __attribute__((reqd_work_group_size(16, 16,1))) at the beginning of my kernel code and it still complains that 16x16 is an invalid work group size.

I think I'm almost there... any idea?

0 Likes
ntrolls
Journeyman III

Local work group size on HD 4850

With v2.0 group size is limited to 64 on 4xxx cards ( some problems with barrier on RV7xx ). Generaly OpenCL for 4xxx series is more on the lines "it works enough to be advertised, but forget about using it for any resonable computations".


But then why would CL_DEVICE_MAX_WORK_GROUP_SIZE return 1024...?

0 Likes
MicahVillmow
Staff
Staff

Local work group size on HD 4850

Try 256, 1, 1 instead of 16, 16, 1.
0 Likes
MicahVillmow
Staff
Staff

Local work group size on HD 4850

ntrolls,
There is a difference between the largest size that the device can support and the largest that a particular kernel can support.
0 Likes
ntrolls
Journeyman III

Local work group size on HD 4850

Originally posted by: MicahVillmow ntrolls, There is a difference between the largest size that the device can support and the largest that a particular kernel can support.


Yes, I know. But I even tried a kernel that does not do anything (it simply returns) and still could not assign 16x16 local work group size - don't know if this little experiment makes any sense, but there it is for what it's worth.

And no... (256,1,1) still does not work.

0 Likes