2 Replies Latest reply on Dec 23, 2010 8:50 AM by nou

    Thread creation overhead


      I have a kernel that's basically only doing a lookup. The total numer of lookups is about 32.000.

      Question: Is the overhead to start & execute ONE thread PER lookup worth it?

      Or would it be faster if I make a little loop inside the kernel to do several, lets say 16, lookups per thread? Just to avoid the overhead of too much thread creation.

      __constant int lookup_table[256] = {...some values...}; __kernel void some_kernel(__global int* in, __global int* out) { uint tid = get_global_id(0); out[tid] = lookup_table[in[tid]]; }