AnsweredAssumed Answered

Local memory and work-groups

Question asked by rj.marques on Mar 17, 2012
Latest reply on Mar 17, 2012 by rj.marques

Hello,

 

Say I want to allocate X bytes of local memory and clGetDeviceInfo function returns that the maximum size of local memory is Y.

 

I also have a uni-dimensional global work size of T, and a uni-dimensional local work-size of L, consequently I have W = T/L work-groups.

 

How do I calculate the effective quantity of local memory? Is it just X? Or is it X*W?

 

I have an AMD HD4850, and I have implemented the following example:

 

- Number of work-items (1D): 384 000

- Number of work-items per work-group (1D): 256 (the maximum for my GPU)

- Local memory 40 bytes

- Maximum local memory 16384 bytes

 

In this scenario, clEnqueueNDRangeKernel returns the error: CL_INVALID_WORK_GROUP_SIZE. The interesting thing is if the number of work-items per work-group is set as 64, it works fine.

 

What am I missing?

Thanks for your replies

 

Edit: BTW, CPU execution works fine in any of the above scenarios...is it a bug?

Outcomes