I some questoins about CS.
1. To execute CS i have to use calCtxRunProgramGrid(). For this i have to fill structure CALprogramGrid.
What is a gridBlock, gridSize and dcl_num_thread_per_group (used in gridBlock) in this structure?
For example. if i have dcl_num_thread_per_group 64, that means, that my domain of execution (256x256 matrix as input), which settings i have to adjust in CALprogramGrid?
2. If dcl_num_thread_per_group = 64, that means that only one SIMD are used? And if 1024 - all?
3. Is it correct, that g[vaTid0.x] have four ellements? That mean, that every g has x,y,z,w? And is it possible to have 1D 1 float array?
1. How to organize loop buffer in CL.
I have an index (=4), buffer size = 16. I need to multiply
for (int i=0 ; i< size ; i++)
a += b*c[(i+index)%size]; // simple fir operation
In any DSP it is possible by setting up buffer length, buffer address and increment to spesial address registers. How to do in in CL? In assebler it is AR register.
2. How to implement d = a*b - c? In assembler it is mad r0,r1,r2,-r3. How it is working in CL?