When I read the SimpleMatMult example code in SDK, I was puzzled by the organization of array in stream. After thinking for a while, I think I understand it, so the following is what I learned from it. Please tell me am I right? Thanks.
If I declare a 2-D array such as float arr[2][4];
I should declare the Stream as:
unsigned int arr_size[] = {4, 2};
brook::Stream<float> stream_arr(2, arr_size);
And in CPU, the arr[2][4] is organized as :
arr00 arr01 arr02 arr03
arr10 arr11 arr12 arr13
However, in the GPU, the stream_arr is organized as :
arr00 arr10
arr01 arr11
arr02 arr12
arr03 arr13
Also, in kernel function, the range of instance().y is [0, 1] and the range of instance().x is [0, 3]. That is to say, in the kernel function, the instance().y is corresponding to the row of arry in the CPU model.
So, is my description right? Thanks a lot.
Yes
Originally posted by: scutan
And in CPU, the arr[2][4] is organized as :
arr00 arr01 arr02 arr03
arr10 arr11 arr12 arr13
No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.
In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr
float 2d_array[height][width];
float 1d_array[height*width];
unsigned int size[] = {width,height};
Stream 2d_stream(2,size);
2d_array
arr_0_0 arr_0_1 ... arr_0_width-1
arr_1_0
. ... arr_y_x
.
arr_height-1_0 ... arr_height-1_width-1
Originally posted by: Gipsel Originally posted by: scutan
And in CPU, the arr[2][4] is organized as :
arr00 arr01 arr02 arr03
arr10 arr11 arr12 arr13
No, the stream is organized the same as the C array, only the stream declaration switches the order of width and height compared to the array.
In your example, you see that you write the layout of the arr[2][4] as arr[height][width] in which you index arr
exactly like for the stream indices. So when you have: float 2d_array[height][width];
float 1d_array[height*width];
unsigned int size[] = {width,height};
Stream 2d_stream(2,size);
2d_array
and 2d_stream refer to the same positions. When you use a linear array to fill a 2d_stream (or a linear stream representing a two-dimensional array) this would be 1d_array[y*width + x]. So the indexing works exactly the same. arr_0_0 arr_0_1 ... arr_0_width-1
arr_1_0
. ... arr_y_x
.
arr_height-1_0 ... arr_height-1_width-1
You mean that the organization of array in GPU is the same as in CPU ? And the only difference is that `only the stream declaration switches the order of width and height compared to the array.`
I agree that `2d_array
The reason that I am puzzled by the organization of array in GPU is that in the Stream_Computing_User_Guide document, on page A-4 in Appendix A Brook+ Specification, in section A.3.2 Stream Declarations which says
int d<100, 200, 300> 3D 100 * 200 * 300 int elements in size. The 100 200 300 are in reverse order of that array in CPU. So I think the array in GPU is the transposed array of that in CPU.
Thanks a lot.
You mean that the organization of array in GPU is the same as in CPU ? And the only difference is that `only the stream declaration switches the order of width and height compared to the array.`
See my find min max mean thread, gaurav.garg only switched the order of array but the reduce kernel code still works in each dimension for array, in which along x position reducing y elements 5 by 5, while the stream the dimension becomes y position doing that transposedly.
And see this kernel code from samples transpose.br:
kernel void
transposeGPU(float i[][], out float o<>)
{
// Get the (x,y) position of o in (index.x, index.y)
int2 index = instance().xy;
// Fetch a value from (y,x)
o = i[index.x][index.y];
}
Gipsel is correct. The order is similar to CPU arrays. You probably have some confusion with indexing.
kernel void
transposeGPU(float i[][], out float o<>
{
// Get the (x,y) position of o in (index.x, index.y)
// instance().x is column and instance().y is row
int2 index = instance().xy;
// Fetch a value from (y,x)
// C - style indexing. index.x is row and index.y is column number
o = i[index.x][index.y];
}
But when streamWrite happens from o<> to output array, the transpose happens right?
Thanks a lot.
Was it not expected to be right if the layout is same as C arrays. as mentioned in transpose commnets input(y,x) value is writeen to output(x,y).
Here stream(x,y) means input
Thank you.