14 Replies Latest reply on Aug 29, 2008 9:42 PM by Ceq

    Suggestions for gather on flattened 2D array?

    ryta1203
      I was wondering if anyone had any suggestions to go about accessing (via gather) a 1D array (that was a 2D array now flattened to 1D) that is not a direct indexof() access.

      For example:

      normal code:
      for (i = 0 ; i < row ; i++)
      for(j = 0 ; j < col ; j++)
      newValue = array[i+row*j];

      This can be done normally over the domain, no problems:

      newValue = array[indexof(out)];

      But if we want to translate this:
      normal code:
      for (i = 0 ; i < row ; i++)
      for(j = 0 ; j < col ; j++)
      i = i+x;
      j = j+y;
      newValue = array[i+row*j];
      // newValue = array[(i+x)+row*(j+y)];

      In Brook+, the indexof() is going to return some value (one value).
      There are "two" dimensions here but using a flattened 2D array (1D array) it appears there are going to be problems accessing the correct index in a gather (or scatter for that matter).

      For instance, in CUDA you can simply use the x and y dimensions, just like you would in a nested for loop, but with Brook+ indexof() returns only value for a 1D stream.
        • Suggestions for gather on flattened 2D array?
          Ceq
          Hi Ryta, umh, I'm not sure to understand this well, but I'm going to try to give my opinion:

          1. Do you really need to flatten the array? On a 2D stream indexof would return a float2 with the coordinates so that problem would be gone

          2. With 1D flattened output you still can access a 2D array with array[ float2(pos % col, pos / col) ] // pos is the position returned by indexof(out)

          3. With 2D output you can access a 1D flattened array with array[pos.x + pos.y * col] // float2 pos is the position returned by indexof(out)

          4. Otherwise, if both the output and the data array are flattened there wouldn't be any problem
            • Suggestions for gather on flattened 2D array?
              ryta1203
              Originally posted by: Ceq

              Hi Ryta, umh, I'm not sure to understand this well, but I'm going to try to give my opinion:



              1. Do you really need to flatten the array? On a 2D stream indexof would return a float2 with the coordinates so that problem would be gone



              2. With 1D flattened output you still can access a 2D array with array[ float2(pos % col, pos / col) ] // pos is the position returned by indexof(out)



              3. With 2D output you can access a 1D flattened array with array[pos.x + pos.y * col] // float2 pos is the position returned by indexof(out)



              4. Otherwise, if both the output and the data array are flattened there wouldn't be any problem



              1. Yes, I want the array flattened for the moment.
              2. I am looking into this.
              3. indexof(out) will only return 1 value since it's a 1D array
              4. Yes, they are both flattened. The problem lies in the fact that if you don't want to access some input data based directly on the element of the stream that it is currently on, you may want to access some element based on the current index stream WITH some offset calculations.
            • Suggestions for gather on flattened 2D array?
              Ceq
              Ok, sorry, I think now I understand the question.
              Well, there are three cases:

              1. You know that adding the offset won't overflow neither underflow the original array:
              newValue = array[ indexof(out) + x + y * col ];

              2. If the offsets overflows you want to access the boundary element:
              pos = indexof(out);
              i = clamp(fmod(pos, row) + x, 0, col - 1);
              j = clamp(floor(pos / row) + y, 0, row - 1);
              newValue = array[ i + j * col ];

              3. You just want to control whether adding the offset would access an invalid location:
              pos = indexof(out);
              i = fmod(pos, row) + x;
              j = floor(pos / row) + y;
              test = (i < 0) + (i > col - 1) + (j < 0) + (j > row - 1);
              if(test == 0) newValue = array[ indexof(out) + i + j * col ];

              - Index numbers should be floats, looks like Brook+ doesn't handle well integer elements.
              - x and y are the offsets, row and col are the original array dimensions

              I hope this helps, I apologize if I mistake the question again.
                • Suggestions for gather on flattened 2D array?
                  ryta1203
                  No worries, I appreciate any help I can get, thanks a bunch. I'm not worried about overflow/underflow etc. I know this doesn't happen.

                  What I really want is to be able to access the x and y components individually of the 1D stream using indexof().

                  I think you solved my answer with this:

                  i = fmod(pos, row) + x;
                  j = floor(pos / row) + y;
                    • Suggestions for gather on flattened 2D array?
                      ryta1203
                      So let me better illuminate the situation, oddly the kernel seems to do NOTHING (produces the same results if I eliminate the equivalent CPU code and get results), even if I get rid of all the "if" statements.

                      here is some CPU code:
                      for(y=bk;y<=my-bk+1;y++)
                      {
                      F1to4[0+gx*y].w = F1to4[(mx-1)+gx*y].w;
                      F1to4[0+gx*y].x = F1to4[(mx-1)+gx*y].x;
                      F1to4[0+gx*y].y = F1to4[(mx-1)+gx*y].y;
                      F1to4[0+gx*y].z = F1to4[(mx-1)+gx*y].z;
                      F5to8[0+gx*y].w = F5to8[(mx-1)+gx*y].w;
                      F5to8[0+gx*y].x = F5to8[(mx-1)+gx*y].x;
                      F5to8[0+gx*y].y = F5to8[(mx-1)+gx*y].y;
                      F5to8[0+gx*y].z = F5to8[(mx-1)+gx*y].z;
                      F9[0+gx*y].w = F9[(mx-1)+gx*y].w;
                      F1to4[(mx+1)+gx*y].w = F1to4[2+gx*y].w;
                      F1to4[(mx+1)+gx*y].x = F1to4[2+gx*y].x;
                      F1to4[(mx+1)+gx*y].y = F1to4[2+gx*y].y;
                      F1to4[(mx+1)+gx*y].z = F1to4[2+gx*y].z;
                      F5to8[(mx+1)+gx*y].w = F5to8[2+gx*y].w;
                      F5to8[(mx+1)+gx*y].x = F5to8[2+gx*y].x;
                      F5to8[(mx+1)+gx*y].y = F5to8[2+gx*y].y;
                      F5to8[(mx+1)+gx*y].z = F5to8[2+gx*y].z;
                      F9[(mx+1)+gx*y].w = F9[2+gx*y].w;
                      }

                      NOW, I am trying to do the same thing in Brook+ kernel:

                      kernel void advection2_s(float4 Fin1to4[], float4 Fin5to8[], float4 Fin9[], int gx, int gy,
                      int mx, int my, int bk, out float4 Fs9<>, out float4 Fs5to8<>, out float4 Fs1to4<>)
                      {
                      int x, y, idx;
                      idx = indexof(Fs1to4);
                      x = (int)fmod((float)idx, (float)gx);
                      y = (int)floor((float)idx/(float)gx);

                      if ((y > (bk-1)) && (y <= (my-bk+1)))
                      {
                      if (idx == gx*y)
                      {
                      Fs1to4 = Fin1to4[(mx-1)+gx*y];
                      //Fs1to4.x = Fin1to4[(mx-1)+gx*y].x;
                      //Fs1to4.y = Fin1to4[(mx-1)+gx*y].y;
                      //Fs1to4.z = Fin1to4[(mx-1)+gx*y].z;
                      Fs5to8 = Fin5to8[(mx-1)+gx*y];
                      //Fs5to8.x = Fin5to8[(mx-1)+gx*y].x;
                      //Fs5to8.y = Fin5to8[(mx-1)+gx*y].y;
                      //Fs5to8.z = Fin5to8[(mx-1)+gx*y].z;
                      Fs9.w = Fin9[(mx-1)+gx*y].w;
                      }
                      if (idx == (mx+1)+gx*y)
                      {
                      Fs1to4 = Fin1to4[2+gx*y];
                      //Fs1to4.x = Fin1to4[2+gx*y].x;
                      //Fs1to4.y = Fin1to4[2+gx*y].y;
                      //Fs1to4.z = Fin1to4[2+gx*y].z;
                      Fs5to8 = Fin5to8[2+gx*y];
                      //Fs5to8.x = Fin5to8[2+gx*y].x;
                      //Fs5to8.y = Fin5to8[2+gx*y].y;
                      //Fs5to8.z = Fin5to8[2+gx*y].z;
                      Fs9.w = Fin9[2+gx*y].w;
                      }
                      }
                      }

                      and the kernel call:
                      advection2_s(Fs1to4, Fs5to8, Fs9, gx, gy, mx, my, bk, Fs9, Fs5to8, Fs1to4);

                      All streams are the same size.
                  • Suggestions for gather on flattened 2D array?
                    Ceq
                    Umh, its hard to tell just looking the code, however I've two hints:

                    - Whenever I tried to use int type in kernels I got wrong results, can you try using float type only?

                    - You said it appears the kernel does nothing, just before the end of the kernel asign a value to the output streams without deleting or commentig previous code, still the same output? If so and you're sure the kernel is called it could be related to a compiler bug.
                      • Suggestions for gather on flattened 2D array?
                        ryta1203
                        Originally posted by: Ceq

                        Umh, its hard to tell just looking the code, however I've two hints:



                        - Whenever I tried to use int type in kernels I got wrong results, can you try using float type only?



                        - You said it appears the kernel does nothing, just before the end of the kernel asign a value to the output streams without deleting or commentig previous code, still the same output? If so and you're sure the kernel is called it could be related to a compiler bug.


                        1. When I use floats instead of ints I get garbage output. If I go back to ints I get some real output. This could be two things:

                        a. It doesn't like floats being used as index values
                        b. The code is incorrect and the floats make the code work

                        2. I will try this and more.
                      • Suggestions for gather on flattened 2D array?
                        Ceq
                        About multiple outputs with multiple kernels chained together one after another:
                        I've tested several modified versions of your example on the other post and seems to work ok.

                        - You said that using floats instead of ints gets you garbage output... what happens if you use floats and asign a value at the end of the kernel?

                        - You can also try deleting parts of the kernel until you get the values you wrote at the end of the kernel, just in case is some kind of compiler bug.

                        - About chained kernels: for testing purposes you could dump the streams after each call and compare it with a software only equivalent output.
                          • Suggestions for gather on flattened 2D array?
                            ryta1203
                            1. How did you modify them to get them to work?
                            2. I get garbage.
                            3. I'm already doing this and they don't compare.

                            Even if I call another kernel (different than the one posted here) I get the same output as if the kernel was not called. I thought it might be some logic in the way I was passing streams through the chained kernels, but I'm beginning to think this is just another limitation of SDK.