cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

FrodoTheGiant
Journeyman III

Thread creation overhead

I have a kernel that's basically only doing a lookup. The total numer of lookups is about 32.000.

Question: Is the overhead to start & execute ONE thread PER lookup worth it?

Or would it be faster if I make a little loop inside the kernel to do several, lets say 16, lookups per thread? Just to avoid the overhead of too much thread creation.

__constant int lookup_table[256] = {...some values...}; __kernel void some_kernel(__global int* in, __global int* out) { uint tid = get_global_id(0); out[tid] = lookup_table[in[tid]]; }

0 Likes
2 Replies
himanshu_gautam
Grandmaster

The best answer according to me is to check it yourself . AFAIK I don't think there are considerable overheads for manageing large threads.But you might get some performace gain by using co-elesced global memory access. And you should keep in  mind that your compute units are not starved.

 

0 Likes
nou
Exemplar

try and you will see. this is the best way.

0 Likes