Archives Discussions

ryta1203 · ‎08-22-2008

I was wondering if anyone had any suggestions to go about accessing (via gather) a 1D array (that was a 2D array now flattened to 1D) that is not a direct indexof() access.

For example:

normal code:
for (i = 0 ; i < row ; i++)
for(j = 0 ; j < col ; j++)
newValue = array[i+row*j];

This can be done normally over the domain, no problems:

newValue = array[indexof(out)];

But if we want to translate this:
normal code:
for (i = 0 ; i < row ; i++)
for(j = 0 ; j < col ; j++)
i = i+x;
j = j+y;
newValue = array[i+row*j];
// newValue = array[(i+x)+row*(j+y)];

In Brook+, the indexof() is going to return some value (one value).
There are "two" dimensions here but using a flattened 2D array (1D array) it appears there are going to be problems accessing the correct index in a gather (or scatter for that matter).

For instance, in CUDA you can simply use the x and y dimensions, just like you would in a nested for loop, but with Brook+ indexof() returns only value for a 1D stream.

Ceq · ‎08-22-2008

Hi Ryta, umh, I'm not sure to understand this well, but I'm going to try to give my opinion:

1. Do you really need to flatten the array? On a 2D stream indexof would return a float2 with the coordinates so that problem would be gone

2. With 1D flattened output you still can access a 2D array with array[ float2(pos % col, pos / col) ] // pos is the position returned by indexof(out)

3. With 2D output you can access a 1D flattened array with array[pos.x + pos.y * col] // float2 pos is the position returned by indexof(out)

4. Otherwise, if both the output and the data array are flattened there wouldn't be any problem

ryta1203 · ‎08-25-2008

Originally posted by: Ceq

Hi Ryta, umh, I'm not sure to understand this well, but I'm going to try to give my opinion:

1. Do you really need to flatten the array? On a 2D stream indexof would return a float2 with the coordinates so that problem would be gone

2. With 1D flattened output you still can access a 2D array with array[ float2(pos % col, pos / col) ] // pos is the position returned by indexof(out)

3. With 2D output you can access a 1D flattened array with array[pos.x + pos.y * col] // float2 pos is the position returned by indexof(out)

4. Otherwise, if both the output and the data array are flattened there wouldn't be any problem

1. Yes, I want the array flattened for the moment.
2. I am looking into this.
3. indexof(out) will only return 1 value since it's a 1D array
4. Yes, they are both flattened. The problem lies in the fact that if you don't want to access some input data based directly on the element of the stream that it is currently on, you may want to access some element based on the current index stream WITH some offset calculations.

Ceq · ‎08-25-2008

Ok, sorry, I think now I understand the question.
Well, there are three cases:

1. You know that adding the offset won't overflow neither underflow the original array:
newValue = array[ indexof(out) + x + y * col ];

2. If the offsets overflows you want to access the boundary element:
pos = indexof(out);
i = clamp(fmod(pos, row) + x, 0, col - 1);
j = clamp(floor(pos / row) + y, 0, row - 1);
newValue = array[ i + j * col ];

3. You just want to control whether adding the offset would access an invalid location:
pos = indexof(out);
i = fmod(pos, row) + x;
j = floor(pos / row) + y;
test = (i < 0) + (i > col - 1) + (j < 0) + (j > row - 1);
if(test == 0) newValue = array[ indexof(out) + i + j * col ];

- Index numbers should be floats, looks like Brook+ doesn't handle well integer elements.
- x and y are the offsets, row and col are the original array dimensions

I hope this helps, I apologize if I mistake the question again.

ryta1203 · ‎08-25-2008

No worries, I appreciate any help I can get, thanks a bunch. I'm not worried about overflow/underflow etc. I know this doesn't happen.

What I really want is to be able to access the x and y components individually of the 1D stream using indexof().

I think you solved my answer with this:

i = fmod(pos, row) + x;
j = floor(pos / row) + y;

ryta1203 · ‎08-25-2008

So let me better illuminate the situation, oddly the kernel seems to do NOTHING (produces the same results if I eliminate the equivalent CPU code and get results), even if I get rid of all the "if" statements.

here is some CPU code:
for(y=bk;y<=my-bk+1;y++)
{
F1to4[0+gx*y].w = F1to4[(mx-1)+gx*y].w;
F1to4[0+gx*y].x = F1to4[(mx-1)+gx*y].x;
F1to4[0+gx*y].y = F1to4[(mx-1)+gx*y].y;
F1to4[0+gx*y].z = F1to4[(mx-1)+gx*y].z;
F5to8[0+gx*y].w = F5to8[(mx-1)+gx*y].w;
F5to8[0+gx*y].x = F5to8[(mx-1)+gx*y].x;
F5to8[0+gx*y].y = F5to8[(mx-1)+gx*y].y;
F5to8[0+gx*y].z = F5to8[(mx-1)+gx*y].z;
F9[0+gx*y].w = F9[(mx-1)+gx*y].w;
F1to4[(mx+1)+gx*y].w = F1to4[2+gx*y].w;
F1to4[(mx+1)+gx*y].x = F1to4[2+gx*y].x;
F1to4[(mx+1)+gx*y].y = F1to4[2+gx*y].y;
F1to4[(mx+1)+gx*y].z = F1to4[2+gx*y].z;
F5to8[(mx+1)+gx*y].w = F5to8[2+gx*y].w;
F5to8[(mx+1)+gx*y].x = F5to8[2+gx*y].x;
F5to8[(mx+1)+gx*y].y = F5to8[2+gx*y].y;
F5to8[(mx+1)+gx*y].z = F5to8[2+gx*y].z;
F9[(mx+1)+gx*y].w = F9[2+gx*y].w;
}

NOW, I am trying to do the same thing in Brook+ kernel:

kernel void advection2_s(float4 Fin1to4[], float4 Fin5to8[], float4 Fin9[], int gx, int gy,
int mx, int my, int bk, out float4 Fs9<>, out float4 Fs5to8<>, out float4 Fs1to4<>)
{
int x, y, idx;
idx = indexof(Fs1to4);
x = (int)fmod((float)idx, (float)gx);
y = (int)floor((float)idx/(float)gx);

if ((y > (bk-1)) && (y <= (my-bk+1)))
{
if (idx == gx*y)
{
Fs1to4 = Fin1to4[(mx-1)+gx*y];
//Fs1to4.x = Fin1to4[(mx-1)+gx*y].x;
//Fs1to4.y = Fin1to4[(mx-1)+gx*y].y;
//Fs1to4.z = Fin1to4[(mx-1)+gx*y].z;
Fs5to8 = Fin5to8[(mx-1)+gx*y];
//Fs5to8.x = Fin5to8[(mx-1)+gx*y].x;
//Fs5to8.y = Fin5to8[(mx-1)+gx*y].y;
//Fs5to8.z = Fin5to8[(mx-1)+gx*y].z;
Fs9.w = Fin9[(mx-1)+gx*y].w;
}
if (idx == (mx+1)+gx*y)
{
Fs1to4 = Fin1to4[2+gx*y];
//Fs1to4.x = Fin1to4[2+gx*y].x;
//Fs1to4.y = Fin1to4[2+gx*y].y;
//Fs1to4.z = Fin1to4[2+gx*y].z;
Fs5to8 = Fin5to8[2+gx*y];
//Fs5to8.x = Fin5to8[2+gx*y].x;
//Fs5to8.y = Fin5to8[2+gx*y].y;
//Fs5to8.z = Fin5to8[2+gx*y].z;
Fs9.w = Fin9[2+gx*y].w;
}
}
}

and the kernel call:
advection2_s(Fs1to4, Fs5to8, Fs9, gx, gy, mx, my, bk, Fs9, Fs5to8, Fs1to4);

All streams are the same size.

eduardoschardong · ‎08-25-2008

Originally posted by: ryta1203
idx = indexof(Fs1to4);
x = (int)fmod((float)idx, (float)gx);
y = (int)floor((float)idx/(float)gx);

I didn't try, but does execDomain work in this case?
http://forums.amd.com/forum/me...tid=328&threadid=98107

ryta1203 · ‎08-25-2008

Originally posted by: eduardoschardong

Originally posted by: ryta1203

idx = indexof(Fs1to4);

x = (int)fmod((float)idx, (float)gx);

y = (int)floor((float)idx/(float)gx);

I didn't try, but does execDomain work in this case?

http://forums.amd.com/forum/me...tid=328&threadid=98107

I'm not sure how execDomain will help. Can you elaborate on what you mean? It doesn't look like it would help at all.

eduardoschardong · ‎08-25-2008

EDIT: Forgot, it won't work.

Ceq · ‎08-25-2008

Umh, its hard to tell just looking the code, however I've two hints:

- Whenever I tried to use int type in kernels I got wrong results, can you try using float type only?

- You said it appears the kernel does nothing, just before the end of the kernel asign a value to the output streams without deleting or commentig previous code, still the same output? If so and you're sure the kernel is called it could be related to a compiler bug.

ryta1203 · ‎08-26-2008

Originally posted by: Ceq

Umh, its hard to tell just looking the code, however I've two hints:

- Whenever I tried to use int type in kernels I got wrong results, can you try using float type only?

- You said it appears the kernel does nothing, just before the end of the kernel asign a value to the output streams without deleting or commentig previous code, still the same output? If so and you're sure the kernel is called it could be related to a compiler bug.

1. When I use floats instead of ints I get garbage output. If I go back to ints I get some real output. This could be two things:

a. It doesn't like floats being used as index values
b. The code is incorrect and the floats make the code work

2. I will try this and more.

ryta1203 · ‎08-27-2008

Ceq · ‎08-29-2008

About multiple outputs with multiple kernels chained together one after another:
I've tested several modified versions of your example on the other post and seems to work ok.

- You said that using floats instead of ints gets you garbage output... what happens if you use floats and asign a value at the end of the kernel?

- You can also try deleting parts of the kernel until you get the values you wrote at the end of the kernel, just in case is some kind of compiler bug.

- About chained kernels: for testing purposes you could dump the streams after each call and compare it with a software only equivalent output.

ryta1203 · ‎08-29-2008

1. How did you modify them to get them to work?
2. I get garbage.
3. I'm already doing this and they don't compare.

Even if I call another kernel (different than the one posted here) I get the same output as if the kernel was not called. I thought it might be some logic in the way I was passing streams through the chained kernels, but I'm beginning to think this is just another limitation of SDK.

Ceq · ‎08-29-2008

Archives Discussions

Suggestions for gather on flattened 2D array?