josopait

number of threads in scatter kernels

Discussion created by josopait on Aug 4, 2008
Latest reply on Aug 4, 2008 by ryta1203

Hi,

 

I noticed that the number of threads is always determined by the size of the output stream, even if the output stream is a scatter stream.

 

For instance, consider the following:

 

kernel void foo(float4 a<>, out float4 b[])
{
    ...
}

int main()
{
    float4 a<10>;
    float4 b<100>;

    foo(a, b);
}

 

The kernel foo is called with 100 threads, one for each element of b. I find this a rather odd behavior. I would find it much more natural if the number of threads is determined by the size of the input streams in such cases.

 

I want to perform operations on a large matrix (say, a 10x10 matrix that is provided as scatter stream b, if we stick to the above example). Because the matrix elements partially depend on one another, I cannot make use of 100 threads, but I only want 10 threads, one for every row. The way I am doing this now is to get the thread number from the output stream and return immediately if the number is too large, like so:

kernel void foo(float4 a<>, out float4 b[])
{
    int task = indexof(b);
    if (task >= 10)
    {
        return;
    }

    << perform calculations on row 'task' >>

}

 

This seems a bit silly. Is there a better way to specify the number of threads?

 

Ingo

Outcomes