cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

boxerab
Challenger

work_group_scan_inclusive_add and register count

I thought I would try out some of the new OpenCL 2.0 workgroup functions.

Comparing perf of work_group_scan_inclusive_add vs my home-grown prefix scan, I found that

work_group_scan_inclusive_add  led to less work-item divergence, but used up 10 more VGPRs.

My own scan, using local memory, led to more divergence but no increase in VGPR usage.

Overall, work_group_scan_inclusive_add was faster. But, is there a way for this method

to use existing registers and not increase register pressure ?

Thanks,

Aaron

0 Likes
5 Replies
dipak
Big Boss

Hi Aaron,

If you are asking about any performance/optimization hints to compiler that can control the register usage, there is no such flag at this moment.

Regards,

0 Likes

Thanks. My question is more: is it possible to use existing registers for this built-in function? It seems to allocate its own set of registers.

0 Likes

I don't think it's possible. Still I'll check with the compiler team.

Regards,

0 Likes

thanks for checking.

0 Likes

At this point, there is no control over the number of registers used nor which registers are used for this built-in function. I think, optimization of the register usage is a never ending task for the compiler team and hope, it will get better over time.

Regards,

0 Likes