Archives Discussions

viscocoa · ‎02-16-2012

I just "improved" my OpenCL program by removing a tiny function, which is actually a legacy of its C++ CPU predecessor. The function was called in the very inner loop.

The kernel cost 87.89 ms with the tiny function, and 35.55 ms without the function (performing the actions directly).

I was told that all OpenCL functions are inline, which explains why OpenCL does not allow recursions. Inline functions should not cause much overhead.

What does OpenCL really do when a function is called in OpenCL? Should functions be avoided as much as possible?

Any suggestions will be appreciated.

Vis Cocoa

KNeumann · ‎02-17-2012

Could you provide us with a short example? Where do you exactly call the function? What kind of function is it?

viscocoa · ‎02-17-2012

Hi KNeumann,

The strange thing happened on Thursday night. I replaced a tiny function with direct operations, and the running time was shortened dramatically.

I took time trying to repeat the legendary process on Friday, but I did not have the luck again. Neither could I use the same technique to speed up other parts of my program.

So, please forget it. OpenCL is working as it is supposed to.

Thank you for replying, and have a good weekend!

Vis Cocoa

Rom1 · ‎02-17-2012

Yes this could be quite interesting because i was also sure all function were inlined .... if not it could explain some lack of efficiency ....

viscocoa · ‎02-17-2012

Hi Rom1,

I am sorry to say that the experiment is not repeatable, as I explained above. Please forget it.

Thank you for your kind reply and have a good weekend!

Vis Cocoa

Bdot · ‎02-21-2012

I noticed that some of my functions were slower than directly including the code when I forgot to mark the input-only parameters as "const". But since I added that it's same speed.

viscocoa · ‎02-21-2012

Thank you Bdot! It is very helpful to know const parameters speed up function "call". How about pointers then, like int*?

Bdot · ‎02-22-2012

Yes, my changes included from

uint4 *res

to

uint4 * const res

for returning results. For other pointers I added "restrict", like

uint * restrict base

But I did not test each change for performance, so I cannot tell if that made a difference.

I think, if you know how the parameters are used (and you should 😉 ), then giving these hints to the compiler will never hurt. As a minimum it will make life easier for the optimizer, and at best it allows for optimizations that would not be done otherwise.

viscocoa · ‎02-22-2012

Thank you very much Bodt. You are right. Giving the compiler as many hints as possible will always be beneficial.

Thank you again!

Archives Discussions

Is a function call so expensive?