cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

savage309
Adept I

Marking OpenCL function as __noinline__

Hey there,

Is there a way to make real function call using GCN 1.2 and OpenCL ?

What I am having is an OpenCL app with heavy register spilling (lets just assume that splitting to multi kernels is not an option). Is there an option to mark a function as noinline, so when a function call is made, all the register used by the current function will be stored to the global memory and the new function will have all the registers for itself - thus if the function is not being called/used there will be no drawback from having it (it will not impact the register spilling in the function, from where it is being called). We are doing similiar stuff for other platforms where this seems to help us improve the performance.

If there is a way, what is the OpenCL keyword to do so ?

Thanks.

p.s. if there is a way to do funciton call, what abour recursion ?

0 Likes
1 Solution

It's kinda impossible to isolate an inlined function in the binary. It's already mixed and interleaved with other code, so you can't just cut it out.

All you can do is to choose a 'platform' that supports function calls.

- I don't know much about HSA, but I think I've read that it supports function pointers, so there should be callable functions too.

- The other way where function calling works 100% is GCN ASM. You can query/set the program counter, so you can call any address in gpu memory. Even you can write self modifying programs, just like on an unprotected x86. The downside is that you must rewrite the whole thing from scratch, and need to reverse engineer, how the current driver sends the parameters to the kernel.

View solution in original post

5 Replies
maxdz8
Elite

There's a fairly weak connection between inlining and overspilling.

Anyway, I agree this should exist mostly as a way to control ISA size (plus: there's loop unroll and loop don't unroll options).

Issue: should this be considered per-function or per-call?

0 Likes

I think it should be per function.

Still any ideas how this could be made (if it is possible at all) will be greatly appreciated.

0 Likes

I am bumping this a bit.

A really want to do a real function call.

Is there any way to do it ? Modifying intermediate representations or other hacks ?

Doesn't matter how complicated it is, is there any way to do it ?

AFAIK, on AMD platforms, all function calls in OpenCL are inlined by default and currently, there is no other way to do it.

Regards,

0 Likes

It's kinda impossible to isolate an inlined function in the binary. It's already mixed and interleaved with other code, so you can't just cut it out.

All you can do is to choose a 'platform' that supports function calls.

- I don't know much about HSA, but I think I've read that it supports function pointers, so there should be callable functions too.

- The other way where function calling works 100% is GCN ASM. You can query/set the program counter, so you can call any address in gpu memory. Even you can write self modifying programs, just like on an unprotected x86. The downside is that you must rewrite the whole thing from scratch, and need to reverse engineer, how the current driver sends the parameters to the kernel.