Archives Discussions

scutan · ‎07-05-2009

When I read the SimpleMatMult example code in SDK, I was puzzled by the organization of array in stream. After thinking for a while, I think I understand it, so the following is what I learned from it. Please tell me am I right? Thanks.

If I declare a 2-D array such as float arr[2][4];

I should declare the Stream as:

unsigned int arr_size[] = {4, 2};

brook::Stream<float> stream_arr(2, arr_size);

And in CPU, the arr[2][4] is organized as :

arr00 arr01 arr02 arr03

arr10 arr11 arr12 arr13

However, in the GPU, the stream_arr is organized as :

arr00 arr10

arr01 arr11

arr02 arr12

arr03 arr13

Also, in kernel function, the range of instance().y is [0, 1] and the range of instance().x is [0, 3]. That is to say, in the kernel function, the instance().y is corresponding to the row of arry in the CPU model.

So, is my description right? Thanks a lot.

riza_guntur · ‎07-05-2009

Yes

Gipsel · ‎07-05-2009

Originally posted by: scutan

And in CPU, the arr[2][4] is organized as :
arr00 arr01 arr02 arr03
arr10 arr11 arr12 arr13

No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.

In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr exactly like for the stream indices. So when you have:

float 2d_array[height][width];

float 1d_array[height*width];

unsigned int size[] = {width,height};

Stream 2d_stream(2,size);

2d_array and 2d_stream refer to the same positions. When you use a linear array to fill a 2d_stream (or a linear stream representing a two-dimensional array) this would be 1d_array[y*width + x]. So the indexing works exactly the same.

arr_0_0 arr_0_1 ... arr_0_width-1

arr_1_0

. ... arr_y_x

.

arr_height-1_0 ... arr_height-1_width-1

scutan · ‎07-05-2009

Originally posted by: Gipsel
Originally posted by: scutan

And in CPU, the arr[2][4] is organized as :
arr00 arr01 arr02 arr03
arr10 arr11 arr12 arr13

No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.
In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr exactly like for the stream indices. So when you have:
float 2d_array[height][width];
float 1d_array[height*width];
unsigned int size[] = {width,height};
Stream 2d_stream(2,size);
2d_array and 2d_stream refer to the same positions. When you use a linear array to fill a 2d_stream (or a linear stream representing a two-dimensional array) this would be 1d_array[y*width + x]. So the indexing works exactly the same.
arr_0_0 arr_0_1 ... arr_0_width-1
arr_1_0
. ... arr_y_x
.
arr_height-1_0 ... arr_height-1_width-1

You mean that the organization of array in GPU is the same as in CPU ? And the only difference is that `only the stream declaration switches the order of width and height compared to the array.`

I agree that `2d_array and 2d_stream refer to the same positions` and `the stream declaration switches the order of width and height compared to the array`.

The reason that I am puzzled by the organization of array in GPU is that in the Stream_Computing_User_Guide document, on page A-4 in Appendix A Brook+ Specification, in section A.3.2 Stream Declarations which says

int d<100, 200, 300> 3D 100 * 200 * 300 int elements in size. The 100 200 300 are in reverse order of that array in CPU. So I think the array in GPU is the transposed array of that in CPU.

Thanks a lot.

riza_guntur · ‎07-06-2009

You mean that the organization of array in GPU is the same as in CPU ? And the only difference is that `only the stream declaration switches the order of width and height compared to the array.`

See my find min max mean thread, gaurav.garg only switched the order of array but the reduce kernel code still works in each dimension for array, in which along x position reducing y elements 5 by 5, while the stream the dimension becomes y position doing that transposedly.

And see this kernel code from samples transpose.br:

kernel void
transposeGPU(float i[][], out float o<>)
{
    // Get the (x,y) position of o in (index.x, index.y)
    int2 index = instance().xy;

    // Fetch a value from (y,x)
    o = i[index.x][index.y];
}

gaurav_garg · ‎07-06-2009

Gipsel is correct. The order is similar to CPU arrays. You probably have some confusion with indexing.

kernel void
transposeGPU(float i[][], out float o<>
{
// Get the (x,y) position of o in (index.x, index.y)

    // instance().x is column and instance().y is row
    int2 index = instance().xy;

    // Fetch a value from (y,x)

// C - style indexing. index.x is row and index.y is column number
o = i[index.x][index.y];
}

riza_guntur · ‎07-06-2009

But when streamWrite happens from o<> to output array, the transpose happens right?

Thanks a lot.

gaurav_garg · ‎07-06-2009

Was it not expected to be right if the layout is same as C arrays. as mentioned in transpose commnets input(y,x) value is writeen to output(x,y).

Here stream(x,y) means input in C-style indexing.

riza_guntur · ‎07-06-2009

Thank you.

Archives Discussions

puzzled by the organization of arry in stream