AnsweredAssumed Answered

Local memory and work-groups

Question asked by rj.marques on Mar 17, 2012
Latest reply on Mar 17, 2012 by rj.marques



Say I want to allocate X bytes of local memory and clGetDeviceInfo function returns that the maximum size of local memory is Y.


I also have a uni-dimensional global work size of T, and a uni-dimensional local work-size of L, consequently I have W = T/L work-groups.


How do I calculate the effective quantity of local memory? Is it just X? Or is it X*W?


I have an AMD HD4850, and I have implemented the following example:


- Number of work-items (1D): 384 000

- Number of work-items per work-group (1D): 256 (the maximum for my GPU)

- Local memory 40 bytes

- Maximum local memory 16384 bytes


In this scenario, clEnqueueNDRangeKernel returns the error: CL_INVALID_WORK_GROUP_SIZE. The interesting thing is if the number of work-items per work-group is set as 64, it works fine.


What am I missing?

Thanks for your replies


Edit: BTW, CPU execution works fine in any of the above it a bug?