cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

thomasp
Journeyman III

OpenCL, is there instruction limitations ?

Jump to solution

Hi

I plan to code something in OpenCL, using überKernel pattern.

It means that a given kernel would have this structure:

__kernel void my_uber_kernel(void)

{

     while(...)

     {

          if(stage==..)

          {

               device_function_0() ;

          } else

          if(stage==...)

          {

               device_function_1() ;

          }

          // etc...

          stage = stage + 1 ;

     }

}

Each one of device_function_X() potentially contains a substantial amount of code.

I'm wondering if there is known limitations regarding the amount of instructions supported (per thread?) before performances are impacted ?

Does splitting process in small device functions calls help to optimize ?

Or do I have to split process in several kernel calls (so that above-mentioned device_function_X become kernels)

0 Likes
1 Solution

Accepted Solutions
hazeman
Adept II

Re: OpenCL, is there instruction limitations ?

Jump to solution

I think your question was ambiguous. You asked for program size limit ( maximum possible ). And Micah answered it - You can have really huge kernels and in practice I doubt it's possible to hit this limit. But I thought that maybe you want to ask/know when there is a performance penalty for kernel size and that's why I posted my answer.

When GPU doesn't find kernel code in cache it has to load it from global memory. And global memory is orders of magnitude slower than cache. Also you hit penalty for cache miss.

You can find post with benchmark here

View solution in original post

0 Likes
12 Replies
MicahVillmow
Staff
Staff

Re: OpenCL, is there instruction limitations ?

Jump to solution

There is a limit on code size, but it is program dependent as on the GPU everything gets inlined, which can massively explode what could be a relatively small program with lots of function calls.

thomasp
Journeyman III

Re: OpenCL, is there instruction limitations ?

Jump to solution

ok

any idea of an order of magnitude ?

I mean, something like : "1000 lines of inlined code is the limit before performance is exponantially degraded"

0 Likes
MicahVillmow
Staff
Staff

Re: OpenCL, is there instruction limitations ?

Jump to solution

We are talking multiple MB of source code after inlining everything.

0 Likes
jeff_golds
Staff
Staff

Re: OpenCL, is there instruction limitations ?

Jump to solution

Keep in mind that registers are reserved globally as we can't know which part of the code you will use.  So if the worst-case is using 200 registers, then that can limit performance of other code paths that would use far less registers.

I have seen program that have 70,000 ISA instructions and they are still running pretty fast.  Compilation take around 30s on a fast machine.

thomasp
Journeyman III

Re: OpenCL, is there instruction limitations ?

Jump to solution

Thank you all for these enlightening answers

0 Likes
hazeman
Adept II

Re: OpenCL, is there instruction limitations ?

Jump to solution

GPU has program cache. If I'm not mistaken it's 48 KB in size. So any code witch is bigger then this limits is slowing down.

There was some old post with data from benchmarking different sized kernels.

You can check exact code size by looking at the ISA from your kernel ( it's at the end of ISA listing ).

0 Likes
thomasp
Journeyman III

Re: OpenCL, is there instruction limitations ?

Jump to solution
MicahVillmow, although stamped as "AMD", is talking of MB (MegaBytes?) of source code, wheras you and jeff_golds evoke 48KB / 70,000 ISA instructions.

The answers have quite a different order of magnitude !

How are performances degraded when total amount of instructions overflow program cache (by factor 2, 4, ...) ? Is there known tests on this point ?

0 Likes
hazeman
Adept II

Re: OpenCL, is there instruction limitations ?

Jump to solution

I think your question was ambiguous. You asked for program size limit ( maximum possible ). And Micah answered it - You can have really huge kernels and in practice I doubt it's possible to hit this limit. But I thought that maybe you want to ask/know when there is a performance penalty for kernel size and that's why I posted my answer.

When GPU doesn't find kernel code in cache it has to load it from global memory. And global memory is orders of magnitude slower than cache. Also you hit penalty for cache miss.

You can find post with benchmark here

View solution in original post

0 Likes
thomasp
Journeyman III

Re: OpenCL, is there instruction limitations ?

Jump to solution

I agree the question was ambiguous, this is due to a lack of knowledge from me obviously.

Anyway thank you for the clarifications and pointer, I think I get it now

0 Likes