Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Adept I

Is a function call so expensive?

I just "improved" my OpenCL program by removing a tiny function, which is actually a legacy of its C++ CPU predecessor. The function was called in the very inner loop.

The kernel cost 87.89 ms with the tiny function, and 35.55 ms without the function (performing the actions directly).

I was told that all OpenCL functions are inline, which explains why OpenCL does not allow recursions. Inline functions should not cause much overhead.

What does OpenCL really do when a function is called in OpenCL? Should functions be avoided as much as possible?

Any suggestions will be appreciated.

Vis Cocoa

8 Replies
Adept II

Could you provide us with a short example? Where do you exactly call the function? What kind of function is it?


Hi KNeumann,

The strange thing happened on Thursday night. I replaced a tiny function with direct operations, and the running time was shortened dramatically.

I took time trying to repeat the legendary process on Friday, but I did not have the luck again. Neither could I use the same technique to speed up other parts of my program.

So, please forget it. OpenCL is working as it is supposed to.

Thank you for replying, and have a good weekend!

Vis Cocoa

Adept I

Yes this could be quite interesting because i was also sure all function were inlined .... if not it could explain some lack of efficiency ....


Hi Rom1,

I am sorry to say that the experiment is not repeatable, as I explained above. Please forget it.

Thank you for your kind reply and have a good weekend!

Vis Cocoa


I noticed that some of my functions were slower than directly including the code when I forgot to mark the input-only parameters as "const". But since I added that it's same speed.

Thank you Bdot! It is very helpful to know const parameters speed up function "call". How about pointers then, like int*?


Yes, my changes included from

uint4 *res


uint4 * const res

for returning results. For other pointers I added "restrict", like

uint * restrict base

But I did not test each change for performance, so I cannot tell if that made a difference.

I think, if you know how the parameters are used (and you should 😉 ), then giving these hints to the compiler will never hurt. As a minimum it will make life easier for the optimizer, and at best it allows for optimizations that would not be done otherwise.

Thank you very much Bodt. You are right. Giving the compiler as many hints as possible will always be beneficial.

Thank you again!