cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jean-claude
Journeyman III

Constant buffers naming convention in Brook kernel generated IL

using brook generated IL in CAL

Hi Gaurav,

If my understanding is correct there are 15 constant buffers,
which can contain upto 1024*4 elements.

The naming goes from cb0 to cb14.

So consider the following kernel

kernel void k_trial(out float A<>, float BO<>, float B1<>, float C0, float C1) {
    A = B0*C0 + B1*C1;
}

To use the generated IL in CAL, I would assume the following naming for binding:

input            A  <=>  o0

outputs        B0 <=>  i0        B1 <=>  i1

constants     C0 <=>  cb0      C1 <=>  cb1

That sounds fine for input & outputs, but for constants it seems that there
are both folded in cb0 through :

dcl_cb cb0[2]



Questions:

(1) so what are the variable names for C0 and C1 ?

    CALname C0_name = 0;
    calModuleGetName(&C0_name,  ctx, module_k_trial, "???");
    calCtxSetMem(ctx, C0_name, C0_Mem);

    CALname C1_name = 0;
    calModuleGetName(&C1_name,  ctx, module_k_trial, "???");
    ...


(2) is this to say that for n constants the generated IL would be dcl_cb cb0?
    but then when are cb1, ..., cb14 used?

 

Thanks for some hints.

Jean-Claude

0 Likes
10 Replies
gaurav_garg
Adept I

Hi Jean-Claude,

All the contants declared are combined into a single constant buffer. So, you need to bind this data to single constant buffer. When you allocate data it has to be 128-bit aligned (always allocate *_4 CAL format) for each constant as CAL has 128-bit alignment requiremnts with constants.

calResAlloc*(, , 2, CAL_FORMAT_FLOAT_4, ); // Resource of Width 2 (for 2 constants)

calResMap(&ptr, constRes, );
ptr[0] = firstConstant;
ptr[5] = secondConstant; //Write after 128-bits asigned for first constant

You should bind this resource with "cbo". Hope it helps.

0 Likes

Hey thanks,

So just to see if my understanding is correct:


kernel void k_trial(out float A<>, float BO<>, float B1<>, float C0, float C1, float C2 ) {
   A = B0*C0 + B1*C1 - C2;
}


// Allocate 4 float constants
// Note: Resource width is set to 1 (equivalent of 4 float constants) (is this safe?)
// ----------------------------------------------------------------------------------
CALresource constants_Res;
calResAllocLocal1D(&constants_Res, device, 1, CAL_FORMAT_FLOAT_4, 0);

// Set constant values
// -------------------
calResMap(&ptr, constants_Res);
ptr[0] = Constant_0;
ptr[1] = Constant_1;
ptr[2] = Constant_2;
ptr[3] = Constant_3; // won't be used
calResUnmap(constants_Res);


// Binding to ctx
// --------------
CALmem constants_Mem = 0;
calCtxGetMem(&constants_Mem, ctx, constants_Res)


// Binding to kernel constant pin
// --------------------------------------
CALname constants_name = 0;
calModuleGetName(&constants_name,  ctx, module_k_trial, "cb0");
calCtxSetMem(ctx, constants_name, constants_Mem);

...

then execute kernel

 

Right?

0 Likes

Resource width should be same as number of constants. Also, notice the assignment of mapped pointer.

CALresource constants_Res;
calResAllocLocal1D(&constants_Res, device, 3, CAL_FORMAT_FLOAT_4, 0);

// Set constant values
// -------------------
calResMap(&ptr, constants_Res);
ptr[0] = Constant_0;
ptr[
4] = Constant_1;
ptr[
8] = Constant_2;
calResUnmap(constants_Res);

0 Likes

Got it!

So apparently there is no way to "trick" the allocator by assigning a FLOAT_4 resource, and then use FLOAT_1 slices in it?

 

But then when you use your second approach, you get FLOAT_1 constant  items, isn't it ?   ie.

kernel void test(float a[1024], float b[16][16], out float c<> {...}

for constant vector a (ie cb0), I would then assume that the upfront CAL related resource allocation would be

calResAllocLocal1D(&constant-a, device, 1024, CAL_FORMAT_FLOAT, 0);

or the equivalent to ensure alignment

calResAllocLocal1D(&constant-a, device, 1024/4, CAL_FORMAT_FLOAT_4, 0);

 

 

 

 

0 Likes

Unfortunately, CAL requires 128-bit allocation for constant buffers irrespective of type of constant array.

So, you need to allocate resource like this-

calResAllocLocal1D(&constant-a, device, 1024, CAL_FORMAT_FLOAT_4, 0);

if you are using constant array of type float, float2 ot float4 with 1024 elements.

0 Likes

Ok, sorry but  I'm sill getting somewhat confused

Assume I have 64 float constants from A0 to A63

Assume my kernel should add constant A6 to input stream.

The resource allocation is done through:
calResAllocLocal1D(&constant_A, device, 64, CAL_FORMAT_FLOAT_4, 0);

which actually contains 64*4 float values


The correct kernel is then (with A being assigned to cb0)

kernel void test ( out float C<>, float A[64], float B<> ) {  C = B + A[5]; }

in other words the index in A is a 128bits index...


So my question is what would  the following kernels mean and do?

(1)   kernel void test ( out float C<>, float4 A[64], float B<> ) { C = B + A[5].x; }

 

// and moreover the one to write A constants from the 64 first elements of a D stream

the CALoutput domain being {0,0,64,1}

(2)   kernel void set_A ( out float A<>,  float D[] ) {

       int pos = instance().x;

       A = D[pos];

}

or should it be

(3)   kernel void set_A ( out float4 A<>,  float D[] ) {

       int pos = instance().x;

       A.x = D[pos];

}

0 Likes

By the way, additionally it seems that Brook compiler is having difficulty with this:

(1) kernel void test1 ( float a[128], float b<>, out float c<> { c = a[33] + b;}

compiles properly, no problem

 

but if the order of parameter is changed...


(2) kernel void test2 ( out float c<>, float b<>, float a[128]) { c = a[33] + b;}

NOTICE: Parse error
While processing <buffer>:88
In compiler at zzerror()[parser.y:112]
  message = parse error

ERROR: Parse error. Expected declaration.
While processing <buffer>:88

0 Likes

And this compiles properly too ...

kernel void test2 ( out float c<>, float b<>, float a[128], int d) {
   c = a[33] + b + d;
}

Sounds like Brook compiler doesn't like fixed sized constant definition being the last parameter...

0 Likes

Thanks for pointing out the compilation issues. I have filed a bug for it. Regarding your previos question-

Both the kernels works the same way.

kernel void test ( out float C<>, float A[64], float B<> ) {  C = B + A[5]; }

kernel void test ( out float C<>, float4 A[64], float B<> ) { C = B + A[5].x; }

When you call a kernel with constant buffer, a pointer of constant array is passed, not a stream. you need to pass a float and float4 pointer in these kernel. Runtime will always internally allocate a float4 CAL buffer and copy data such that it maintains 128-bit straddling for each element. Hope it helps in understanding.

0 Likes
gaurav_garg
Adept I

Originally posted by: jean-claude  (2) is this to say that for n constants the generated IL would be dcl_cb cb0?     but then when are cb1, ..., cb14 used?

 

 

 

You can use multiple constant buffers through Brook+ if you declare constants with their size in square bracket.

kernel void test(float a[1024], float b[16][16], out float c<>; //It will allocate two constants buffers of size 1024 and 256 = (16x16)

0 Likes