OpenCL runtimes for Windows x64 at least from 15.7.1 drivers on return garbage when queried by clGetKernelWorkGroupInfo with CL_KERNEL_PRIVATE_MEM_SIZE. If there is some spilled registers, then it returns their size in global memory, and that is presumably intended usage of the function (at least Intel OpenCL works this way). But if there is no spills, then it can return 160, 144, 320 or some other number. As a result, it is useless for the kernel optimization—you never know if this number real or fake.
I stored some OpenCL binaries on Adrenalin 20.5.1 drivers to demonstrate this behaviour, correct reports are in bold:
|Binary||Disassembler output||clGetKernelWorkGroupInfo with CL_KERNEL_PRIVATE_MEM_SIZE output|
|3p_Angles_Int_2d_DCdir_0-Tahiti.bin||22312 ISA, 0 scratch, 167/256 VGPR, 71/102 SGPR||160 - expected 0|
|3p_Angles_Int_2d_DCinv_0-Tahiti.bin||19464 ISA, 0 scratch, 171/256 VGPR, 71/102 SGPR||144 - expected 0|
|3p_Angles_Int_2d_GGPdir_0-Tahiti.bin||16008 ISA, 80 scratch, 113/256 VGPR, 67/102 SGPR||320|
|3p_Angles_Int_2d_EEBRdir_0-Tahiti.bin||15248 ISA, 64 scratch, 223/256 VGPR, 63/102 SGPR||256|
|3p_Angles_Int_2d_EEBRinv_0-Tahiti.bin||26696 ISA, 48 scratch, 222/256 VGPR, 63/102 SGPR||192|
|3p_Angles_Int_2d_GEPinv_0-Tahiti.bin||16208 ISA, 76 scratch, 224/256 VGPR, 63/102 SGPR||304|
OpenCL files and all other staff is inside the binaries.
Thank you for providing the above information.
As per the spec, clGetKernelWorkGroupInfo with CL_KERNEL_PRIVATE_MEM_SIZE returns "the minimum amount of private memory, in bytes, used by each work-item in the kernel. This value may include any private memory needed by an implementation to execute the kernel, including that used by the language built-ins and variable declared inside the kernel with the __private qualifier."
I guess the reported values also include other private memory usage, not just the "spilled registers". Anyway, I'll check with the OpenCL team to know whether it is the expected behavior and let you know.
Thanks for the reply. If we take the point of view of this text, then all these values are wrong, as they are definitely lower then the size of VGPRs (and private variables) used by every workitem of the kernels. Awaiting for your answer about intended behaviour.
I think that examples in bold are right because of the answer: "The value we return for CL_KERNEL_PRIVATE_MEM_SIZE is additional private memory that we need per work item, above and beyond what we can store in the register file." by Ben_A_Intel. I was happy at first to find the same behaviour for AMD runtime, but then I found out that when there is no spilled registers, it returns something unexpected.
Thank you for sharing the above information.
From the OpenCL team's feedback, it seems that your understanding/expectation about the usage is correct. In this case, they suspect that it could be broken for SI (GCN1) cards like Tahiti, since we don’t do any development for SI which uses AMDIL path and we dropped support for AMDIL long ago. Could you please check if it's working with CI family (GCN2) or newer card?