clGetKernelWorkGroupInfo() return amount only static allocated local memory. that mean for example __local a.
when you set size of local with clsetKernelArg() it will add to this number.
also spilled register are private memory so they dont get count into local memory.
you can set env var GPU_DUMP_DEVICE_KERNEL=3. it will create ISA representation of kernel and you can find there some usefull information.
Nou is correct. You will not get GPR usage for a kernel from clGetKernelWorkGroupInfo. I prefer to use AMD Profiler for getting all information that i require to optimize.
Can the profiler print register usage?
BTW: When I read local above I actually meant private, which is also supposed to be available in OpenCL 1.1.
You can set environment variable GPU_DUMP_DEVICE_KERNEL=3 to let the compiler generate isa code. The register usage is in the isa file.
Thanks for the replies. I think with the given pointers I finally managed to work this out. Turns out some of it is actually given in the programming guide, just not in a summarized form. Knowing the relevant keywords one can however find this thread which shows the interesting bits.
So what I can get from the isa file is the following:
- SQ_PGM_RESOURCES:NUM_GPRS – Number of actuall registers used
- MaxScratchRegsNeeded – The number of registeres spilled to / emulated by private memory
- SQ_LDS_ALLOC:SIZE – The statically allocated amount of __local memory. This one is actually kind of confusing, as the programming guide doesn't mention the unit of this. Seems the number is in units of 4 bytes aka floats and the granularity of allocation seems to be 4 floats, as with registers (tested on Cypress).
Do you know of any other interesting metrics I can find from that file? I stumbled over the SQ_PGM_RESOURCES:STACK_SIZE, but it is highly unclear to me what it's supposed to be.