I have a question about the domain of execution in a scatter output, for example, if I have this code:

float var1[x][y][z]

float var2[z];

For i

  For j

    For k

          var2[z] = var1[x][y+offset][z];

    For k

          sort var2[z];

    For k

           var1[x][y+other_offset][z] = var2[z];


Can you run the domain over just x and y dimensions without running over the z dimension, since the entire z dimension is going to be used in every thread?

So, basically, I just want the domain to be over the first two For loops, i and j, not k. I will unroll k inside the kernel.