Is there a way to limit the compiler's VGPR usage (like -cl-nv-maxrregcount on the NVIDIA platform)? This would be a particularly useful feature to increase occupancy. Under the right circumstances, reducing the number of used VGPRs by just a few hardly impacts single-threaded performance, while the increased occupancy may improve overall performance.
Solved! Go to Solution.
Hi romein,
AFAIK, no such feature exists in AMD's OpenCL implementation. But thanks for telling about this interesting feature. I will raise it as a feature request.
I am facing same problem.
Hi romein,
AFAIK, no such feature exists in AMD's OpenCL implementation. But thanks for telling about this interesting feature. I will raise it as a feature request.
Are there any updates about this issue? I find it that AMD cards are often using too many VGPRs and this sometimes have terrible impacts on performance.?
Would like to know the current status. Or at least to get feedback from i.e. CodeXL where those VGPRs counts come from. Simple counting in the code doesn't work.
That is getting the total VGPRs for the whole kernel (which I already knew), not where they come from and on which lines they are created.
Example. I have a kernel, which takes 50 VGPRs. Simply counting variables gives me around 20. So where do the 30 extra VGPRs come from and how do I limit it to 20? I need to know this, because then I can learn how to optimise the code.
+1
I know there is a way to specify suggested workgroup size when compiling a kernel, but that function does not allow values larger than 256 threads/workgroup. this means that CU occupancy will be limited to 512 threads out of 512*5 possible.
When do you plan to extend amd compiler to allow more than 512 threads occupancy (I know and do what to tradeoff spilling or value recomputed for occupancy)? Can you please allow values larger than 256 to be respected by suggested workgroup size parameter (kernel will still be excuted with maximum of 4 warps for 256 threads, but there will be more resident blocks/CU) ? Or can you please implement direct limits on register counts?