8 Replies Latest reply on Jul 6, 2009 7:16 AM by riza.guntur

    puzzled by the organization of arry in stream

    scutan

      When I read the SimpleMatMult example code in SDK, I was puzzled by the organization of array in stream. After thinking for a while, I think I understand it, so the following is what I learned from it. Please tell me am I right? Thanks.

      If I declare a 2-D array such as float arr[2][4];

      I should declare the Stream as:

      unsigned int arr_size[] = {4, 2};

      brook::Stream<float> stream_arr(2, arr_size);

       

      And in CPU, the arr[2][4] is organized as :

      arr00 arr01 arr02 arr03

      arr10 arr11 arr12 arr13

      However, in the GPU, the stream_arr is organized as :

      arr00 arr10

      arr01 arr11

      arr02 arr12

      arr03 arr13

       

      Also, in kernel function, the range of instance().y is [0, 1] and the range of instance().x is [0, 3]. That is to say, in the kernel function, the instance().y is corresponding to the row of arry in the CPU model.

      So, is my description right? Thanks a lot.

        • puzzled by the organization of arry in stream
          Gipsel

           

          Originally posted by: scutan

          And in CPU, the arr[2][4] is organized as :

          arr00 arr01 arr02 arr03

          arr10 arr11 arr12 arr13



          No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.

          In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr[y][x] exactly like for the stream indices. So when you have:

          float 2d_array[height][width];

          float 1d_array[height*width];

          unsigned int size[] = {width,height};

          Stream 2d_stream(2,size);

          2d_array[y][x] and 2d_stream[y][x] refer to the same positions. When you use a linear array to fill a 2d_stream (or a linear stream representing a two-dimensional array) this would be 1d_array[y*width + x]. So the indexing works exactly the same.

          arr_0_0 arr_0_1 ... arr_0_width-1

          arr_1_0

          .      ...  arr_y_x

          .

          arr_height-1_0 ... arr_height-1_width-1

            • puzzled by the organization of arry in stream
              scutan

               

              Originally posted by: Gipsel
              Originally posted by: scutan

              And in CPU, the arr[2][4] is organized as :

              arr00 arr01 arr02 arr03

              arr10 arr11 arr12 arr13



              No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.

              In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr[y][x] exactly like for the stream indices. So when you have:

              float 2d_array[height][width];

              float 1d_array[height*width];

              unsigned int size[] = {width,height};

              Stream 2d_stream(2,size);

              2d_array[y][x] and 2d_stream[y][x] refer to the same positions. When you use a linear array to fill a 2d_stream (or a linear stream representing a two-dimensional array) this would be 1d_array[y*width + x]. So the indexing works exactly the same.

              arr_0_0 arr_0_1 ... arr_0_width-1

              arr_1_0

              .      ...  arr_y_x

              .

              arr_height-1_0 ... arr_height-1_width-1

              You mean that the organization of array in GPU is the same as in CPU ? And the only difference is that `only the stream declaration switches the order of width and height compared to the array.`

              I agree that `2d_array[y][x] and 2d_stream[y][x] refer to the same positions` and `the stream declaration switches the order of width and height compared to the array`.

              The reason that I am puzzled by the organization of array in GPU is that in the Stream_Computing_User_Guide document, on page A-4 in Appendix A Brook+ Specification, in section A.3.2 Stream Declarations which says

              int d<100, 200, 300>  3D 100 * 200 * 300 int elements in size. The 100 200 300 are in reverse order of that array in CPU. So I think the array in GPU is the transposed array of that in CPU.

              Thanks a lot.