    Seeking sum algorithm for GPU


      I have a workgroup of size N, and local array "buffer" of size N.


      For each work item in the group, with local id k, I want to calculate

      the sum S of all array items with index less than k.



      for (int i  = 0; i < k; ++i)

          S += buffer[i];

      Currently, I calculate this naively, as above.  Is there a more efficient way of

      doing this, where intermediate sums are stored back into buffer, for example?