6 Replies Latest reply on Jul 16, 2009 6:20 PM by hagen

    What exactly is Gather und Scatter

    Peterp

      Hi,

      i'm not sure about the difference between gather und scatter streams, is scatter only write operation to a specific address or also a read operation?In the kernel i write to a scatter outputstream

      kernel void (double data<>, out double data1[][])

      {

      int2 index = instance().xy;

      data1[index.y][index.x] = data;

      }

      Is reading from a specific adr. also a scatter operation?

      kernel void (double data[][], out double data1<>

      {

      int2 index = instance().xy;

      data1 = data[index.y][index.x];

      }

        • What exactly is Gather und Scatter
          Raistmer
          IMHO:
          random access reading == gather
          random access writing == scatter

          But don't recall strict definitions of terms in that _alpha_ documentation version.

            • What exactly is Gather und Scatter
              hagen

              Hi Peterp,

              In your first code with "out double data1[][]", you can actually randomly write to AND read from data1.  So you can, for example, use data1 as a storage array.  (But be careful to not have multiple threads write to the same address.)  data1 is called a "scatter" stream.  Brook+ supports a maximum of one scatter stream per kernel.

              In your second code, with "doulbe data[][]", data is read only.  data is called a "gather" stream.

               

                • What exactly is Gather und Scatter
                  Peterp

                  Hi,

                  Yes it is possible to read the output stream in the first kernel but i thought it is not allowed and the result is undefined.

                    • What exactly is Gather und Scatter
                      hagen

                      It is allowed, but there are restrictions and some performance degradation.  Write/read to a scatter stream will mess up the stream communication pattern and should probably not be used unless it is absolutely necessary.

                      One place I have had to do it is when I need to create an scratch array inside a kernel.  Brook+ does not support local array creation (though they said they would include it in sdk 1.4 but didn't have time to finish it), so I sometimes use a scatter stream as global buffer.

                      You can also see the code example I posted under the thread "Need help converting to a brook+ kernel", and associated restrictiions.

                        • What exactly is Gather und Scatter
                          Raistmer
                          Hm, my experiments with scatter streams showed that it's not possible to read back data from them from the same kernel....
                          I.e., no true scratch buffer available for kernel in Brook+ at all.
                          @hagen
                          could you, please, post some minimal working example of such read after write for scratch 2D stream?

                            • What exactly is Gather und Scatter
                              hagen

                              The following is an example code I posted on another thread "Need help converting to brook+ kernel".  For more details, you can check that thread.  (Don't know if it makes a difference, but I am using Catalyst 9.6 on debian amd-64 with sdk 1.4 on a HD 4870.)

                               

                              kernel void idmove_mapped_jdex(int jdex_index<>, int idmove<>, out int idmove_mapped[][]) { int id=instance().x; if(idmove == 1) idmove_mapped[jdex_index][id]++; if(idmove == 0) idmove_mapped[jdex_index][id]--; } main(){ int idmove<5>; int _idmove[5]; int jdex_index<5>; int _jdex_index[5]; int idmove_mapped<40,5>; int _idmove_mapped[40][5]; int _idmove_mapped_reduced[40]; int i,j; _idmove[0]=1; _jdex_index[0]=10; _idmove[1]=1; _jdex_index[1]=20; _idmove[2]=0; _jdex_index[2]=30; _idmove[3]=1; _jdex_index[3]=20; _idmove[4]=0; _jdex_index[4]=30; for (i=0; i<40; i++) { for (j=0; j<5; j++) { _idmove_mapped[i][j]=0; } } streamRead(idmove_mapped,_idmove_mapped); streamRead(idmove,_idmove); streamRead(jdex_index,_jdex_index); idmove_mapped_jdex(jdex_index, idmove, idmove_mapped); streamWrite(idmove_mapped,_idmove_mapped); for (i=0; i<40; i++) { _idmove_mapped_reduced[i]=0; for (j=0; j<5; j++) { _idmove_mapped_reduced[i]+=_idmove_mapped[i][j]; } printf ("%10d %10d \n",i,_idmove_mapped_reduced[i]); } }