cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Marix
Adept II

Getting KernelAnalyzer information without KernelAnalyzer

A solution for linux users and complex applications?

Hi,

Is there any way to retrieve the information usually provided by the KernelAnalyzer without actually using the KernelAnalyzer. I am specifically interested in register as well as memory usage information.

I have two ways in mind, but haven't had success using either:

  • Using clGetKernelWorkGroupInfo.
    This OpenCL-API function should actually provide local and shared memory usage. However it seems to return 0 even if the compile warns "kernel has register spilling. Lower performance is expected".
    Additionally this method sadly does not allow to query register usage.
  • Grepping the compiler generated files.
    On NVIDIA it is possible to grep the resource usages from the temporary files generated by the compiler. I assume something similar is done by the AMD compiler. The information needs to be available somewhere as the KernelAnalyzer seems to utilize the regular OpenCL compiler to gather it's information, too. However, I couldn't find any documentation on where to look for this information.

If any of you knows how to retrieve this information, which is essential for optimization, I would be thankful for any pointers in the right direction.

Copying code into the kernel analzyer sadly is rather cumbersome if your code is spread over multiple files and affected by a multitude of defines.

0 Likes
6 Replies
nou
Exemplar

clGetKernelWorkGroupInfo() return amount only static allocated local memory. that mean for example __local a[14].

when you set size of local with clsetKernelArg() it will add to this number.

also spilled register are private memory so they dont get count into local memory.

you can set env var GPU_DUMP_DEVICE_KERNEL=3. it will create ISA representation of kernel and you can find there some usefull information.

0 Likes

Marix,

Nou is correct. You will not get GPR usage for a kernel from clGetKernelWorkGroupInfo. I prefer to use AMD Profiler for getting all information that i require to optimize.

0 Likes

Can the profiler print register usage?

BTW: When I read local above I actually meant private, which is also supposed to be available in OpenCL 1.1.

0 Likes
yangyi0239
Journeyman III

You can set environment variable GPU_DUMP_DEVICE_KERNEL=3 to let the compiler generate isa code. The register usage is in the isa file.

0 Likes

Hi,

Thanks for the replies. I think with the given pointers I finally managed to work this out. Turns out some of it is actually given in the programming guide, just not in a summarized form. Knowing the relevant keywords one can however find this thread which shows the interesting bits.

So what I can get from the isa file is the following:

  • SQ_PGM_RESOURCES:NUM_GPRS – Number of actuall registers used 
  • MaxScratchRegsNeeded – The number of registeres spilled to / emulated by private memory
  • SQ_LDS_ALLOC:SIZE – The statically allocated amount of __local memory. This one is actually kind of confusing, as the programming guide doesn't mention the unit of this. Seems the number is in units of 4 bytes aka floats and the granularity of allocation seems to be 4 floats, as with registers (tested on Cypress).

Do you know of any other interesting metrics I can find from that file? I stumbled over the SQ_PGM_RESOURCES:STACK_SIZE, but it is highly unclear to me what it's supposed to be.

 

0 Likes

You can also use the command line mode of APP Profiler in linux to get these information. 

0 Likes