I have a nicely working OpenCL solution that processes millions of lines of data in what are logically bundles of 512 lines per chunk. It works fine
Right now I do this in the code:
// this is where my thread's samples start
int my_sample_index = (gid>>1) & 0xFF00;
// calculate our coeffs
int my_index = (gid & 0x007F) * 4;
and it seems to work fine.
But it seems like maybe I am supposed to be making use of local_id(), and am having kind of a hard time getting my head around how you use it.
Is there a good explanation around that makes it clear when you need to make use of that feature of opencl?
If I don't make use of local memory, does that mean that local_id() will be useless to me?
Originally posted by: kbrafford If I don't make use of local memory, does that mean that local_id() will be useless to me?
In your specific example, your my_sample_index and my_index do not correspond neither to get_group_id() nor get_local_id. It would corresponds if my_sample_index = gid & 0xFF00; and my_index = gid & 0x00FF; AND ONLY IF your local_work_size is 256 with global_work_size less than 0xFFFF. In this case I would say to you to use local_id, it would be a more generic solution for whatever local_work_size you specify. But, in your particular example, using get_local_id would require more head around for you to transform it in such an index. It's not because local_id exists that you have to use it. It's up to you to decide how your threads will access the data, it's totally implementation dependent!
I am starting to catch on, I think. The reason my calculation looks weird is that my algorithm vectorizes trivially, and I am actually able to do 4 conceptual pieces of data in each thread.
Looks like I need to find a good book that starts with the basics, even though I already have some working code!