Hi,
I noticed that the number of threads is always determined by the size of the output stream, even if the output stream is a scatter stream.
For instance, consider the following:
kernel void foo(float4 a<>, out float4 b[])
{
...
}
int main()
{
float4 a<10>;
float4 b<100>;
foo(a, b);
}
The kernel foo is called with 100 threads, one for each element of b. I find this a rather odd behavior. I would find it much more natural if the number of threads is determined by the size of the input streams in such cases.
I want to perform operations on a large matrix (say, a 10x10 matrix that is provided as scatter stream b, if we stick to the above example). Because the matrix elements partially depend on one another, I cannot make use of 100 threads, but I only want 10 threads, one for every row. The way I am doing this now is to get the thread number from the output stream and return immediately if the number is too large, like so:
kernel void foo(float4 a<>, out float4 b[])
{
int task = indexof(b);
if (task >= 10)
{
return;
}
<< perform calculations on row 'task' >>
}
This seems a bit silly. Is there a better way to specify the number of threads?
Ingo