Archives Discussions

boxerab · ‎04-20-2017

I thought I would try out some of the new OpenCL 2.0 workgroup functions.

Comparing perf of work_group_scan_inclusive_add vs my home-grown prefix scan, I found that

work_group_scan_inclusive_add led to less work-item divergence, but used up 10 more VGPRs.

My own scan, using local memory, led to more divergence but no increase in VGPR usage.

Overall, work_group_scan_inclusive_add was faster. But, is there a way for this method

to use existing registers and not increase register pressure ?

Thanks,

Aaron

dipak · ‎04-26-2017

Hi Aaron,

If you are asking about any performance/optimization hints to compiler that can control the register usage, there is no such flag at this moment.

Regards,

boxerab · ‎04-26-2017

Thanks. My question is more: is it possible to use existing registers for this built-in function? It seems to allocate its own set of registers.

dipak · ‎04-26-2017

I don't think it's possible. Still I'll check with the compiler team.

Regards,

boxerab · ‎04-26-2017

thanks for checking.

dipak · ‎05-02-2017

At this point, there is no control over the number of registers used nor which registers are used for this built-in function. I think, optimization of the register usage is a never ending task for the compiler team and hope, it will get better over time.

Regards,

Archives Discussions

work_group_scan_inclusive_add and register count