Hi
I plan to code something in OpenCL, using überKernel pattern.
It means that a given kernel would have this structure:
__kernel void my_uber_kernel(void)
{
while(...)
{
if(stage==..)
{
device_function_0() ;
} else
if(stage==...)
{
device_function_1() ;
}
// etc...
stage = stage + 1 ;
}
}
Each one of device_function_X() potentially contains a substantial amount of code.
I'm wondering if there is known limitations regarding the amount of instructions supported (per thread?) before performances are impacted ?
Does splitting process in small device functions calls help to optimize ?
Or do I have to split process in several kernel calls (so that above-mentioned device_function_X become kernels)