Thread creation overhead

Discussion created by FrodoTheGiant on Dec 23, 2010
Latest reply on Dec 23, 2010 by nou

I have a kernel that's basically only doing a lookup. The total numer of lookups is about 32.000.

Question: Is the overhead to start & execute ONE thread PER lookup worth it?

Or would it be faster if I make a little loop inside the kernel to do several, lets say 16, lookups per thread? Just to avoid the overhead of too much thread creation.

__constant int lookup_table[256] = {...some values...}; __kernel void some_kernel(__global int* in, __global int* out) { uint tid = get_global_id(0); out[tid] = lookup_table[in[tid]]; }