5 Replies Latest reply on May 2, 2017 2:35 AM by dipak

    work_group_scan_inclusive_add and register count


      I thought I would try out some of the new OpenCL 2.0 workgroup functions.


      Comparing perf of work_group_scan_inclusive_add vs my home-grown prefix scan, I found that

      work_group_scan_inclusive_add  led to less work-item divergence, but used up 10 more VGPRs.

      My own scan, using local memory, led to more divergence but no increase in VGPR usage.


      Overall, work_group_scan_inclusive_add was faster. But, is there a way for this method

      to use existing registers and not increase register pressure ?