cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rj_marques
Journeyman III

Local memory and work-groups

Hello,

Say I want to allocate X bytes of local memory and clGetDeviceInfo function returns that the maximum size of local memory is Y.

I also have a uni-dimensional global work size of T, and a uni-dimensional local work-size of L, consequently I have W = T/L work-groups.

How do I calculate the effective quantity of local memory? Is it just X? Or is it X*W?

I have an AMD HD4850, and I have implemented the following example:

- Number of work-items (1D): 384 000

- Number of work-items per work-group (1D): 256 (the maximum for my GPU)

- Local memory 40 bytes

- Maximum local memory 16384 bytes

In this scenario, clEnqueueNDRangeKernel returns the error: CL_INVALID_WORK_GROUP_SIZE. The interesting thing is if the number of work-items per work-group is set as 64, it works fine.

What am I missing?

Thanks for your replies

Edit: BTW, CPU execution works fine in any of the above scenarios...is it a bug?

0 Likes
1 Solution
notzed
Challenger

Looks like 4xxx hardware is a bit limited, shoe-horning opencl onto it was a stretch, particularly the local memory stuff (which probably means you're using barriers?): http://devgurus.amd.com/thread/124649

BTW that was the first hit from searching using google for "hd4850 local work size", so a little digging would've saved you some time.

If you've set the local memory to X bytes, it will only consume X bytes per work-group, obviously local memory sizes must be set explicitly (either hardcoded or calculated).  Hardware might be capable of executing N work-groups per processor, in which case it will be NxX physical local memory used, but you don't have direct control over that as such.

View solution in original post

0 Likes
2 Replies
notzed
Challenger

Looks like 4xxx hardware is a bit limited, shoe-horning opencl onto it was a stretch, particularly the local memory stuff (which probably means you're using barriers?): http://devgurus.amd.com/thread/124649

BTW that was the first hit from searching using google for "hd4850 local work size", so a little digging would've saved you some time.

If you've set the local memory to X bytes, it will only consume X bytes per work-group, obviously local memory sizes must be set explicitly (either hardcoded or calculated).  Hardware might be capable of executing N work-groups per processor, in which case it will be NxX physical local memory used, but you don't have direct control over that as such.

0 Likes

I see. Thanks for the reply. You're right I should have googled a bit about my GPU, though at first I did not suspect it was a hardware issue, I thought it was a software problem.

0 Likes