cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Gipsel
Adept I

Weird behaviour when kernels using constant buffers and normal kernels are declared in the same .br file

brook mixes up the access to constant buffers

I see here a strange behaviour of my kernels since I changed from a long list of input arguments constant buffers (because I have some quite similar kernels with just a different number of input parameters). I've looked to the brook generated .cpp file and also the generated IL code and it appears to me that Brook mixes up the access to the constant buffers.

As an example I have the kernels (quite long definitions already, so you may understand why I want to use the constant arrays):

kernel void test2(double a[2][3], double b[2][3], double c [2], double d1, double d2, double d3, double d4, int n, double2 g1[], double g2[], double2 g3[], double2 g4[][], out double2 out1<>, out double out2<>;

kernel void test3(double a[3][3], double b[3][3], double c [3], double d1, double d2, double d3, double d4, int n, double2 g1[], double g2[], double2 g3[], double2 g4[][], out double2 out1<>, out double2 out2<>;

In the brook generated .cpp file I see the arguments are pushed in the same order they are declared, that means constant_0 is created with PushConstantBuffer from a[2][3] or a[3][3], respectively. constant_1 is the b array, constant_2 is the c array. The other arguments are pushed with PushConstant also in the order they are declared.

When looking at the generated IL code, the constantbuffers are used as if array a would be cb0[], b cb1[], and array c can be identified as cb2[]. So far so good.

But unfortunately the constant buffers in IL are declared either as

dcl_cb cb0[5] // should be cb3
dcl_cb cb1[6] // should be cb0 or okay
dcl_cb cb2[6] // should be cb1 or cb0
dcl_cb cb3[2] // should be cb2

or as

dcl_cb cb0[9] // okay
dcl_cb cb1[5] // should be cb3
dcl_cb cb2[3] // okay
dcl_cb cb3[9] // should be cb1

The 5 normal arguments are used as would they sit in cb0[] or cb1[] (the buffer with 5 elements, indexed in the order of declaration). I don't see a scheme in it, it appears to be random and gives of course wrong results. Is this a known problem or what I am doing wrong?

Should I try to change the generated IL so it fits the order the arguments are pushed?

0 Likes
5 Replies
Gipsel
Adept I

Originally posted by: Gipsel
But unfortunately the constant buffers in IL are declared either as
dcl_cb cb0[5] // should be cb3
dcl_cb cb1[6] // should be cb0 or okay
dcl_cb cb2[6] // should be cb1 or cb0
dcl_cb cb3[2] // should be cb2

or as

dcl_cb cb0[9] // okay
dcl_cb cb1[5] // should be cb3
dcl_cb cb2[3] // okay
dcl_cb cb3[9] // should be cb1

The 5 normal arguments are used as would they sit in cb0[] or cb1[] (the buffer with 5 elements, indexed in the order of declaration). I don't see a scheme in it, it appears to be random and gives of course wrong results. Is this a know problem or what I am doing wrong?
Should I try to change the generated IL so it fits the order the arguments are pushed?


I've tried that order (one has also to modify the IL to access the right constant buffers):

dcl_cb cb0[6]
dcl_cb cb1[6]
dcl_cb cb2[2]
dcl_cb cb3[5]

While it definitely helps with the normal arguments of the kernels (they really sit in cb3[5]), the other buffers holding the data of the constant arrays still appears to be mixed somehow. Is there a rule how to deduce the right order of the arguments passed to the kernel in the constant buffers?
Either way, this appears to be a bug of the IL generator in Brook and makes it close to impossible to use constant arrays.

0 Likes

Originally posted by: Gipsel
Either way, this appears to be a bug of the IL generator in Brook and makes it close to impossible to use constant arrays.

Maybe it helps to locate the bug, but the ordering of the constant buffer declarations in the generated IL depends on the "history" of compiled kernels. That means if several kernels are declared in one .br file and compiled, the allocation of the constant buffer depends on the ordering of the kernels in the .br file. That adds some randomness to the process.

Edit:
It appears one cannot mix "normal" kernels with kernels using constant arrays in one .br file.

Edit2:
It may be enough to put all kernels using constant arrays first and all the kernels not using constant arrays at the end of the .br file as a workaround. At least here it is looking better then.
Addition to that: It appears to be working only for the first declared kernel.

Edit3:
As a sidenote, double constants are truncated to 6 decimal places, for both backends (CPU and CAL). That also needs to be fixed if one does not want to edit the literal constants in the brook generated IL (.h) and .cpp.

Edit4:
Just tried to consolidate some of the constant arrays to a double2 constant array. While this technique works well with float4, it goes completely wrong with doubles. To continue my example from above and passing a constant array

double2 ab[2][3]

leads to the declaration of a constant buffer in IL

dcl_cb cb0[11] // How the hell does brook arrive at eleven? I could understand 12 (padding).

The offset calculation in the code appears to be strange, too. At least it is not trying to access indexes above 11. Without some major rework of the generated IL it is not going to work. How is a double2 constant array with 6 elements pushed to the constant buffer? Has the declaration to be cb0[6] or are the individual double values padded to 128bit boundaries (like in a double array), so it has to be cb0[12]? That would render a double2 array somewhat useless compared to double arrays.

I would really appreciate an answer as I want to get this working and have no problem to make some rework to the IL source in the brook generated .h file. I edit the IL either way to implement some missing functions (sqrt and exp) for doubles.

Btw., the brook generated hlsl file is looking good (correct offset calculation) and also makes use of the double2 structure.

0 Likes

For anyone experiencing the same problem (even if nobody appeared to care ), I want to give my final view on the situation after I tried and tested some things and get the stuff running with some editing to the brook generated IL.

If one is using constant array arguments passed to kernels, Brook may (it will, if you have more than one kernel) get the declaration and referencing of the constant buffers in the IL assembly wrong, it simply gets mixed up. One has to manually correct the dcl_cb statements to match the ordering of the parameters in the kernel declaration and also the referencing of these buffers. For double2 constant arrays one has also to correct the constant buffer size declaration in the IL assembly (brook declares the size to 2*n-1, if n is the really needed value). This is definitely a bug of the IL generator in the Brook+ compiler. Hopefully it gets fixed for the next version.

0 Likes

Hi Gipsel,

Thanks for your bug report. Is it possible for you send a test case to streamdeveloper@amd.com?

0 Likes

Originally posted by: Gipsel

For double2 constant arrays one has also to correct the constant buffer size declaration in the IL assembly (brook declares the size to 2*n-1, if n is the really needed value). This is definitely a bug of the IL generator in the Brook+ compiler.



I was just trying to port my stuff finally to Brook1.4. Unfortunately I see that the bug described above (I sent a test case back in february) was only partially fixed so far. The brcc doesn't get confused by several kernels using constant buffers anymore, but the issue with the double2 constant arrrays is still exactly the same.

To sum it up, the gather array feature is still severly bugged. It is quite inconvenient to be forced to repair the constant buffer declaration as well as the index calculation (which can be demanding for more complicated applications than the attached example) in IL from the compiled versions.

kernel void test(double2 m[2], out double b<>) { b=m[0].x*m[0].y+m[1].x*m[1].y; } In the IL generated by brcc it looks like that (shortened): [..] dcl_cb cb0[3] // <= should be cb0[2] [..]func 37 umul r270.x___, l11.x000, l0.x000 iadd r270.x___, r270.x000, l0.x000 umul r271.x___, l11.x000, l0.x000 iadd r271.x___, r271.x000, l0.x000 dmul r272.xy__, cb0[r270.x+0].xy00, cb0[r271.x+1].xy00 // <= indexing is wrong umul r273.x___, l11.x000, l1.x000 iadd r273.x___, r273.x000, l0.x000 umul r274.x___, l11.x000, l1.x000 iadd r274.x___, r274.x000, l0.x000 dmul r275.xy__, cb0[r273.x+0].xy00, cb0[r274.x+1].xy00 // <= indexing is wrong dadd r276.xy__, r272.xy00, r275.xy00 mov r269.xy__, r276.xy00 ret [..] The wrong indexing into the array (accesses even elements outside of the declared constant buffer) can also be easily seen in the GPU disassembly: ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(11) KCACHE0(CB0:0-15) 0 x: MUL_64 T0.x, KC0[0].y, KC0[1].y // should be KC0[0].xy and KC0[0].zw y: MUL_64 T0.y, KC0[0].y, KC0[1].y z: MUL_64 ____, KC0[0].y, KC0[1].y w: MUL_64 ____, KC0[0].x, KC0[1].x t: MOV R0.z, 0.0f 1 x: MUL_64 ____, KC0[2].y, KC0[3].y // should be KC0[1].xy and KC0[1].zw y: MUL_64 ____, KC0[2].y, KC0[3].y z: MUL_64 ____, KC0[2].y, KC0[3].y w: MUL_64 ____, KC0[2].x, KC0[3].x 2 x: ADD_64 R0.x, T0.y, PV1.y y: ADD_64 R0.y, T0.x, PV1.x 01 EXP_DONE: PIX0, R0.xyzz END_OF_PROGRAM

0 Likes