5 Replies Latest reply on May 2, 2017 2:35 AM by dipak

    work_group_scan_inclusive_add and register count

    boxerab

      I thought I would try out some of the new OpenCL 2.0 workgroup functions.

       

      Comparing perf of work_group_scan_inclusive_add vs my home-grown prefix scan, I found that

      work_group_scan_inclusive_add  led to less work-item divergence, but used up 10 more VGPRs.

      My own scan, using local memory, led to more divergence but no increase in VGPR usage.

       

      Overall, work_group_scan_inclusive_add was faster. But, is there a way for this method

      to use existing registers and not increase register pressure ?

       

      Thanks,

      Aaron