Archives Discussions

jean-claude · ‎12-23-2008

using brook generated IL in CAL

Hi Gaurav,

If my understanding is correct there are 15 constant buffers,
which can contain upto 1024*4 elements.

The naming goes from cb0 to cb14.

So consider the following kernel

kernel void k_trial(out float A<>, float BO<>, float B1<>, float C0, float C1) {
    A = B0*C0 + B1*C1;
}

To use the generated IL in CAL, I would assume the following naming for binding:

input            A <=> o0

outputs        B0 <=> i0    B1 <=> i1

constants     C0 <=> cb0      C1 <=> cb1

That sounds fine for input & outputs, but for constants it seems that there
are both folded in cb0 through :

dcl_cb cb0[2]

Questions:

(1) so what are the variable names for C0 and C1 ?

    CALname C0_name = 0;
    calModuleGetName(&C0_name, ctx, module_k_trial, "???");
    calCtxSetMem(ctx, C0_name, C0_Mem);

    CALname C1_name = 0;
    calModuleGetName(&C1_name, ctx, module_k_trial, "???");
    ...

(2) is this to say that for n constants the generated IL would be dcl_cb cb0?
    but then when are cb1, ..., cb14 used?

Thanks for some hints.

Jean-Claude

gaurav_garg · ‎12-23-2008

Hi Jean-Claude,

All the contants declared are combined into a single constant buffer. So, you need to bind this data to single constant buffer. When you allocate data it has to be 128-bit aligned (always allocate *_4 CAL format) for each constant as CAL has 128-bit alignment requiremnts with constants.

calResAlloc*(, , 2, CAL_FORMAT_FLOAT_4, ); // Resource of Width 2 (for 2 constants)

calResMap(&ptr, constRes, );
ptr[0] = firstConstant;
ptr[5] = secondConstant; //Write after 128-bits asigned for first constant

You should bind this resource with "cbo". Hope it helps.

jean-claude · ‎12-23-2008

Hey thanks,

So just to see if my understanding is correct:

kernel void k_trial(out float A<>, float BO<>, float B1<>, float C0, float C1, float C2 ) {
A = B0*C0 + B1*C1 - C2;
}

// Allocate 4 float constants
// Note: Resource width is set to 1 (equivalent of 4 float constants) (is this safe?)
// ----------------------------------------------------------------------------------
CALresource constants_Res;
calResAllocLocal1D(&constants_Res, device, 1, CAL_FORMAT_FLOAT_4, 0);

// Set constant values
// -------------------
calResMap(&ptr, constants_Res);
ptr[0] = Constant_0;
ptr[1] = Constant_1;
ptr[2] = Constant_2;
ptr[3] = Constant_3; // won't be used
calResUnmap(constants_Res);

// Binding to ctx
// --------------
CALmem constants_Mem = 0;
calCtxGetMem(&constants_Mem, ctx, constants_Res)

// Binding to kernel constant pin
// --------------------------------------
CALname constants_name = 0;
calModuleGetName(&constants_name, ctx, module_k_trial, "cb0");
calCtxSetMem(ctx, constants_name, constants_Mem);

...

then execute kernel

Right?

gaurav_garg · ‎12-23-2008

Resource width should be same as number of constants. Also, notice the assignment of mapped pointer.

CALresource constants_Res;
calResAllocLocal1D(&constants_Res, device, 3, CAL_FORMAT_FLOAT_4, 0);

// Set constant values
// -------------------
calResMap(&ptr, constants_Res);
ptr[0] = Constant_0;
ptr[4] = Constant_1;
ptr[8] = Constant_2;
calResUnmap(constants_Res);

jean-claude · ‎12-23-2008

Got it!

So apparently there is no way to "trick" the allocator by assigning a FLOAT_4 resource, and then use FLOAT_1 slices in it?

But then when you use your second approach, you get FLOAT_1 constant items, isn't it ? ie.

kernel void test(float a[1024], float b[16][16], out float c<> {...}

for constant vector a (ie cb0), I would then assume that the upfront CAL related resource allocation would be

calResAllocLocal1D(&constant-a, device, 1024, CAL_FORMAT_FLOAT, 0);

or the equivalent to ensure alignment

calResAllocLocal1D(&constant-a, device, 1024/4, CAL_FORMAT_FLOAT_4, 0);

gaurav_garg · ‎12-23-2008

Unfortunately, CAL requires 128-bit allocation for constant buffers irrespective of type of constant array.

So, you need to allocate resource like this-

calResAllocLocal1D(&constant-a, device, 1024, CAL_FORMAT_FLOAT_4, 0);

if you are using constant array of type float, float2 ot float4 with 1024 elements.

jean-claude · ‎12-23-2008

Ok, sorry but I'm sill getting somewhat confused

Assume I have 64 float constants from A0 to A63

Assume my kernel should add constant A6 to input stream.

The resource allocation is done through:
calResAllocLocal1D(&constant_A, device, 64, CAL_FORMAT_FLOAT_4, 0);

which actually contains 64*4 float values

The correct kernel is then (with A being assigned to cb0)

kernel void test ( out float C<>, float A[64], float B<> ) { C = B + A[5]; }

in other words the index in A is a 128bits index...

So my question is what would the following kernels mean and do?

(1) kernel void test ( out float C<>, float4 A[64], float B<> ) { C = B + A[5].x; }

// and moreover the one to write A constants from the 64 first elements of a D stream

the CALoutput domain being {0,0,64,1}

(2) kernel void set_A ( out float A<>, float D[] ) {

int pos = instance().x;

A = D[pos];

}

or should it be

(3) kernel void set_A ( out float4 A<>, float D[] ) {

int pos = instance().x;

A.x = D[pos];

}

jean-claude · ‎12-23-2008

By the way, additionally it seems that Brook compiler is having difficulty with this:

(1) kernel void test1 ( float a[128], float b<>, out float c<> { c = a[33] + b;}

compiles properly, no problem

but if the order of parameter is changed...

(2) kernel void test2 ( out float c<>, float b<>, float a[128]) { c = a[33] + b;}

NOTICE: Parse error
While processing <buffer>:88
In compiler at zzerror()[parser.y:112]
message = parse error

ERROR: Parse error. Expected declaration.
While processing <buffer>:88

jean-claude · ‎12-23-2008

And this compiles properly too ...

kernel void test2 ( out float c<>, float b<>, float a[128], int d) {
c = a[33] + b + d;
}

Sounds like Brook compiler doesn't like fixed sized constant definition being the last parameter...

gaurav_garg · ‎12-23-2008

Thanks for pointing out the compilation issues. I have filed a bug for it. Regarding your previos question-

Both the kernels works the same way.

kernel void test ( out float C<>, float A[64], float B<> ) { C = B + A[5]; }

kernel void test ( out float C<>, float4 A[64], float B<> ) { C = B + A[5].x; }

When you call a kernel with constant buffer, a pointer of constant array is passed, not a stream. you need to pass a float and float4 pointer in these kernel. Runtime will always internally allocate a float4 CAL buffer and copy data such that it maintains 128-bit straddling for each element. Hope it helps in understanding.

gaurav_garg · ‎12-23-2008

Originally posted by: jean-claude (2) is this to say that for n constants the generated IL would be dcl_cb cb0? but then when are cb1, ..., cb14 used?

You can use multiple constant buffers through Brook+ if you declare constants with their size in square bracket.

kernel void test(float a[1024], float b[16][16], out float c<>; //It will allocate two constants buffers of size 1024 and 256 = (16x16)

Archives Discussions

Constant buffers naming convention in Brook kernel generated IL