I figure this should be a really really basic question, with answer in the IL documentation, I seem to be wrong.
Are function calls so costly even when you are sure every thread will take the branch, execute the same code in the function, and return at the same time that no one uses them? Or does everyone just experiment to figure it out, either way, I'm calling it out here (ha, calling....it out)
Specifically, Does the optimizer expect any registers to be specifically maintained, or are all considered volatile?
Resource declarations - Since you never have a MAIN call, only an ENDMAIN call, you cannot put your declarations outside of main. Are the declarations for every function in the kernel, or do I need to declare them in the function they will be used in?
Breaking a kernel into multiple kernels: I read a post about splitting up a kernel. Can one kernel make a call to a function defined in another kernel, or that kernels main function, or none of the above?
How about giving me a preview and letting me know if any of this will change as the APU becomes more fully integrated?
I come from an x86/64 assmembly background, especially when it comes to vectorizing and optimizing, so this GPU stuff is right up my alley! (I'm also more than proficient in C/C++, not in spelling.) I've also worked a little with CUDA/PTX, so no, no one needs to call me out and tell me how different things are. However, when I see call, func, etc. No matter what, I begin wondering what are the caveats. Worst case I'll make a system to inline my functions myself, and indeed will have to do a little of that, but I'd like to keep some of the parts of what I'm doing modular/minimal effort reusable.