cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

romein
Journeyman III

Can I force the compiler to reduce VGPR usage?

Is there a way to limit the compiler's VGPR usage (like -cl-nv-maxrregcount on the NVIDIA platform)?  This would be a particularly useful feature to increase occupancy.  Under the right circumstances, reducing the number of used VGPRs by just a few hardly impacts single-threaded performance, while the increased occupancy may improve overall performance.

0 Likes
1 Solution
himanshu_gautam
Grandmaster

Hi romein,

AFAIK, no such feature exists in AMD's OpenCL implementation. But thanks for telling about this interesting feature. I will raise it as a feature request.

View solution in original post

0 Likes
7 Replies
bdoshi
Journeyman III

I am facing same problem.

0 Likes
himanshu_gautam
Grandmaster

Hi romein,

AFAIK, no such feature exists in AMD's OpenCL implementation. But thanks for telling about this interesting feature. I will raise it as a feature request.

0 Likes

Are there any updates about this issue? I find it that AMD cards are often using too many VGPRs and this sometimes have terrible impacts on performance.?

0 Likes

Would like to know the current status. Or at least to get feedback from i.e. CodeXL where those VGPRs counts come from. Simple counting in the code doesn't work.

0 Likes

Hi,

Please find this thread Is there a way to track VGPR usage? for information.

Regards,

0 Likes

That is getting the total VGPRs for the whole kernel (which I already knew), not where they come from and on which lines they are created.

Example. I have a kernel, which takes 50 VGPRs. Simply counting variables gives me around 20. So where do the 30 extra VGPRs come from and how do I limit it to 20? I need to know this, because then I can learn how to optimise the code.

0 Likes
mrrvlad
Adept I

+1

I know there is a way to specify suggested workgroup size when compiling a kernel, but that function does not allow values larger than 256 threads/workgroup. this means that CU occupancy will be limited to 512 threads out of 512*5 possible.

When do you plan to extend amd compiler to allow more than 512 threads occupancy (I know and do what to tradeoff spilling or value recomputed for occupancy)? Can you please allow values larger than 256 to be respected by suggested workgroup size parameter (kernel will still be excuted with maximum of 4 warps for 256 threads, but there will be more resident blocks/CU) ? Or can you please implement direct limits on register counts?

0 Likes