Hi all,
I am considering the possibility of passing local variable (i.e. local memory pointer and/or address) from one kernel to another. So is there anyone here can help a suggestion. The scenario is described below:
- I did the task f1 in kernel k1 and get some results store in a local memory variable (let says __local output_k1[1024]). Now I lunch a second kernel k2 and this kernel k2 wanna get the out put Output_k1 from kernel k1 as k2's input. Traditionally we may need to write back the Output_k1 to the host then the host pass those values located in host memory to the kernel k2. However it is really inefficient to do that, so I want to keep Output_k1 inside the device's local memory and just pass the pointer to that local memory location to kernel k2.
Thank you,
Roto
Thanks Micah. Actually the idea k1 call k2 directly is very useful but only if the workgroup size (and dimension) of k1 and k2 are the same. In my case I want to set workgroup size of k2 different from the one of k1(its sound irregular to do so but because of the structure of the k1's output result ).
Hi Micah,
That's would be a nice solution in this situation . However there's one thing I consider here: there are some __local variables in k1 is not needed anymore from the point I call k2 and beyond. But in k2 I may want to utilize a large __local memory chunk. So do I have any mechanism to release some variables in k1 (atomic operation maybe) and dynamically allocate some new local memory for k2 on the running time.
e.x:
k2()
{
allocate(var);
do the task;
}
---
k1()
{
...
release(var)
k2();
}
---
Why can't you just reuse them? Are you doing inline __local variable declarations in the kernel?
In my opinion that should not have been allowed into the standard because it represents highly dubious scoping. I always write kernels by passing in a local memory allocation through the API because it has clearer scoping of its lifetime. If you do that and just use const offsets into the array in the kernel then you can use the same const addresses in both subkernels for different data and manage the way the local data is used yourself instead of wondering what the compiler wants to do with it.
Maybe:
__kernel a( __local *al )
{
b( al );
c( al );
}
void b( __local *bl )
{...}
void c(__local *cl )
{...}