I have a kernel that uses a constant value buffer as an input (each workitem reads the values from the constant buffer and writes a result based in its own work item value to an output buffer).
I would like that those constants were cached, so the entire workgroup uses those cached values and not a global memory read in each workitem. This is done automatically in the GPU or I should do something to get that? Thanks in advance for your help.
IIRC There are basically three ways for getting reads & writes faster than global memory access:
1. use cached buffers
2. use Images
3. use constant cache.
Refer to the samples constant memory bandwidth & Buffer Bandwidth for details. Also refer to the chapter4 OpenCL Programming Guide(Memory Transfer Optimizations).