1. How to request the number of threads? Through setting the domain size?
taking the cal_idct for example, have a look at the code below,
// Setup a computation domain
g_calDomain3D.width = Info.Width; // assume 256
g_calDomain3D.height = Info.Height; // assume 256
g_calDomain3D.depth = 1;
In this way did I request 256 x 256 threads?
2. As we know, in CAL we can organize the threads into thread group (also called block), and organize thread groups into grid, like the code below(from the cal_idct example):
CALevent event = 0;
g_calProgramGrid.func = g_calFunc;
g_calProgramGrid.flags = 0;
g_calProgramGrid.gridBlock.width = 64; //needs to be = thread group size as given in IL kernel.
g_calProgramGrid.gridBlock.height = 1;
g_calProgramGrid.gridBlock.depth = 1;
g_calProgramGrid.gridSize.width = (g_calDomain3D.width * g_calDomain3D.height +
g_calProgramGrid.gridBlock.width - 1) /
g_calProgramGrid.gridSize.height = 1;
g_calProgramGrid.gridSize.depth = 1;
and in IL kernel, we can get the absolute thread id through vaTid instruction. And I am confused that, in this example, every thread can process a 8x8 block, then to process the entire matrix, we only need 256 * 256 / 64 threads, which is conflicted with the number of total threads we requested before. If we make each thread process only one element of the matrix, it seems that we need 256 * 256 threads. How to get correct understanding of this ? Does the organization of the threads decide the computation task of every thead, or the opposite, or .....
3. Now look at the situation in Brook+ application. In Brook+, we can explicitly set the domain size through kernel interface, or by default map the size of output stream to the domain size. Similarly, does the domain size mean the number of total threads?
I find that in Brook+, we can only set the domain size with kernel interface, and set the thread group size with Attribute keyword, we can not organize the thread groups. And we can know the thread id in a group with instanceInGroup() funciton. But How can we know the absolute thread id or the thead group id in the kernel? If I want to make a thread process more than one element, e.g. a 8x8 block , I need thread id information.