cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

max_bodycad
Journeyman III

Question about CL_DEVICE_MAX_WORK_ITEM_SIZES

Very basic question, if my card can use 256 workitems in each direction, why can't I specify a global range of (256^3, 1, 1)?

Thank you

0 Likes
1 Solution
nou
Exemplar

because multiply of all dimension must also lover than GL_DEVICE_MAX_WORK_ITEM_SIZE which is also 256 on GPU.

View solution in original post

0 Likes
6 Replies
nou
Exemplar

because multiply of all dimension must also lover than GL_DEVICE_MAX_WORK_ITEM_SIZE which is also 256 on GPU.

0 Likes
wayne_static
Adept II

Hello,

The specified 256 work-items in question refers to the total number of work-items in a work-group regardless of whether it is 1-, 2- or 3-dimensions and not the number of work-items in a particular direction. For instance, valid work-group sizes in the format {x, y, z} can be {256, 1, 1} or {16, 16, 1} or {8, 8, 4}. These examples all sum up to 256 work-items in a work-group. Hope this helps.

Ok, so then, if I understand correctly, the max number of kernel invocations you can get is CL_DEVICE_MAX_WORK_GROUP_SIZE * CL_DEVICE_MAX_WORK_ITEM_SIZES, so 65536 on a GPU, right?

0 Likes

You can refer to the tables in Appendix D of the AMD APP OpenCL Programming Guide. There are hardware limits listed for various families of GPUs. However I think the figure for work-items under "Global Limits" in the table are at any given time in the most optimal of conditions for a kernel (optimal occupancy I guess, techs please confirm ). I have written kernels that work with over 200,000 work-items but this does not mean that they are all active at the same time. In fact, CodeXL shows device limit for my HD 7xxx series to be 16,777,216 so you can use it to check yours during profiling.

CL_DEVICE_MAX_WORK_GROUP_SIZE is the maximum number of work-items in a work-group which is 256 for current AMD GCN architecture.


I know that CL_DEVICE_MAX_WORK_ITEM_SIZES can be a bit misleading but keep in mind that given that work-items can be arranged in 3-dimensions x, y and z, the condition is that x * y * z must be at most 256.

Ok I realized I got somewhat mislead by an "Out of ressource" error that I encountered when I tried a configuration that worked for DEVICE_TYPE_CPU on GPU. The mistake was to specify a local_range of (1,1,1) instead of not specifying it, leaving the driver chose it for me. It lead me to think that there was a limit on the global size, while, correct me if I'm wrong, the limit is on the number of work groups. Why is that so? I understand why it make sense, on some hardware, to limit the work group's size, but why is there a limit on the number of work groups?

Thank you

0 Likes

I wouldn't say there is a limit on the "number of work-groups" but more accurately, there is a limit on the "number of work-items" that can be spawned in total or that can be doing work at any given time. I know that the number of active work-items is limited by the amount of resources they require (variables in your kernel code) and this is related to what is known as "occupancy". You can find more in the documentations or online. On the other hand, the limiting factor on the total number of work-items as a whole could be down to design or model limitations, hardware design choice and so on. AMD engineers will have a better answer to this as it is more hardware related than OpenCL related I believe

0 Likes