cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

thomasp
Journeyman III

OpenCL, is there instruction limitations ?

Hi

I plan to code something in OpenCL, using überKernel pattern.

It means that a given kernel would have this structure:

__kernel void my_uber_kernel(void)

{

     while(...)

     {

          if(stage==..)

          {

               device_function_0() ;

          } else

          if(stage==...)

          {

               device_function_1() ;

          }

          // etc...

          stage = stage + 1 ;

     }

}

Each one of device_function_X() potentially contains a substantial amount of code.

I'm wondering if there is known limitations regarding the amount of instructions supported (per thread?) before performances are impacted ?

Does splitting process in small device functions calls help to optimize ?

Or do I have to split process in several kernel calls (so that above-mentioned device_function_X become kernels)

0 Likes
1 Solution

I think your question was ambiguous. You asked for program size limit ( maximum possible ). And Micah answered it - You can have really huge kernels and in practice I doubt it's possible to hit this limit. But I thought that maybe you want to ask/know when there is a performance penalty for kernel size and that's why I posted my answer.

When GPU doesn't find kernel code in cache it has to load it from global memory. And global memory is orders of magnitude slower than cache. Also you hit penalty for cache miss.

You can find post with benchmark here

View solution in original post

0 Likes
12 Replies