Archives Discussions

yarr · ‎01-12-2009

weird index behavior

Hello everyone!

Trying to get into brook+ GPU programming, I encountered a weird error with array indexes. Here's my test kernel:

kernel void testkernel(uint2 ker_in<>, uint ker_key[2], uint ker_s87[256], out uint2 ker_out<>)
{
uint ker_n1, ker_n2, t, z, mhm;
uint2 test;

ker_n1 = (uint)ker_in.x;
ker_n2 = (uint)ker_in.y;

t = (ker_n1+ker_key[0]) & (uint)0xFFFFFFFF;
mhm=t>>(uint)24 &(uint)255;
z = ker_s87[mhm];

test.x = z;
test.y = mhm;
ker_out = (uint2)test;
}

Everything works okay, except of the highlited part.
Somehow "z" just returns zero.

"mhm" itself returns proper values: from zero to 255, and also if I just manually substitute the index in the "z" expression, for example:

z = ker_s87[158];

I get proper result, with z returning the needed element.

I can't really understand what's wrong here and why doesnt it work in proper way. 😞

Thanks in advance for any ideas or suggestions!

ryta1203 · ‎01-23-2009

Gaurav,

So, I finally got this working (I think, lol,

).

All I had to do was switch the instance() calls, for example:

FROM:
x = instance().x;
y = instance().y;

TO:
x = instance().y;
y = instance().x;

Everything else (all the other code and gather array accesses, etc) were left the same way, like F. I didn't need to change them to F.

I honestly hope this saves some people time in the future.

Thanks for all the help gaurav, I really appreciate it!

I also understand that this can probably be combed through in the SDK examples but it would be nice to have it documented (if it isn't already) somewhere too. Thanks.

Gipsel · ‎01-22-2009

Originally posted by: gaurav.garg
unsigned int dimc[] = {height,width};
When using C++ style constructor it has to be -
unsigned int dimc[] = {width, height};

Which is actually quite confusing (and not really documented?) as it is indexed and accessed (for instance as a gather input) like a C array which is declared just the other way around.

Btw., I got my code running now also on the GPU. It is really fast, about 120GFlop/s (double precision) get used on a HD4870. It is quite close to the theoretical maximum for the instruction mix (not that much MAD_64 in there and Brook does not generate any, presumably for precision reasons?). But it would have been far easier, if some standard functions would also be supported for doubles. I had to build my own exp and sqrt versions for doubles in IL. The rest of it was written in Brook.

gaurav_garg · ‎01-20-2009

Sorry, I didn't mention instance() returns short vector index that is similiar to indexof().

So, instance().x is column index, instance().y is row index and so on.

dukeleto · ‎01-23-2009

I can confirm that in the examples you can find lines like the following (this one extracted from ImageFilter.br):

int jj = instance().x;
int ii = instance().y;

// These are the offsets so no looping is needed

o_img = img[ii][jj] * mask[0][0];

Personally I find this unnerving! It works, but seeing reference code with something like i=instance().y; j=instance().x
is really disconcerting.

Regards

Archives Discussions

Could use some help with arrays