Is there any way to retrieve the information usually provided by the KernelAnalyzer without actually using the KernelAnalyzer. I am specifically interested in register as well as memory usage information.
I have two ways in mind, but haven't had success using either:
- Using clGetKernelWorkGroupInfo.
This OpenCL-API function should actually provide local and shared memory usage. However it seems to return 0 even if the compile warns "kernel has register spilling. Lower performance is expected".
Additionally this method sadly does not allow to query register usage.
- Grepping the compiler generated files.
On NVIDIA it is possible to grep the resource usages from the temporary files generated by the compiler. I assume something similar is done by the AMD compiler. The information needs to be available somewhere as the KernelAnalyzer seems to utilize the regular OpenCL compiler to gather it's information, too. However, I couldn't find any documentation on where to look for this information.
If any of you knows how to retrieve this information, which is essential for optimization, I would be thankful for any pointers in the right direction.
Copying code into the kernel analzyer sadly is rather cumbersome if your code is spread over multiple files and affected by a multitude of defines.