cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

spectral
Adept II

Best way to allocate exact local memory

Hi,

I need in a kernel to allocate a local memory zone :

__local int chunck[get_local_size(0)];

Unfortunately, it is forbidden.

So, what is the best way to do this ?

Remarks: I think that it is simple to allocate a buffer on the host side and declare it local in the kernel. Right ?

Thanks

0 Likes
6 Replies
Meteorhead
Challenger

You need not allocate a buffer for __local memory. All you need is when you set kernel arguments, you specify the size you wish, and give a NULL for data. Inside the kernel you decalre the given kernel argument to be __local int* chunck.

TADAA, you got your fixed size __local array.

0 Likes

It is what I say in my "remarks" lol 😄

But thanks for the confirmation 😉

 

My main problem is to check how the "local buffer allocation" has an impact on the ideal work-group size !!!! 😛

0 Likes

Beware that statically allocated memory from host will not affect preferred workgroup size, as at compile time the API has no knowledge of how much will you allocate with clSetKernelArg(). Because of this you can allocate practically infinite amount of local without pref. WGS being changed, and excess local memory will reside in VRAM. You will still access it via the __local memoryspace, but the speed will be much slower. If you start allocating local from host, you will have to start keeping track of how much memory you allocated.

0 Likes

meteorhead,
local memory size is limited(32kb on GPUs) and this can be determined by querying the OpenCL api for the exact size. Whether the local is declared in the kernel or on the host side via setKernelArg doesn't matter, the same limit applies.
0 Likes

I know that physically it is limited, but I was very much surprised that I was running simulations very well with a kernel that used 48kB of local memory on a HD5970. Then I was told that the excess memory will reside on VRAM. If this is not the case, then there was true black magic going on.

0 Likes

Meteorhead,
I would like to see this test case, it technically should not work on the GPU(it might on the CPU). The reason is that the compiler can allocate a maximum of 32kb per work-group, so it should fail setkernelarg/enqueuendrange with out of resources.

Micah
0 Likes