I have a kernel that uses a global read only uint array where each work item read 18 address sequentially. i.e. All work item has a different set of 18 uints.
I also have a global read only array of uint with only 4 element. The 4 uints are broadcast to all work-items.
Now I can't (and also don't want to) use LDS for it both of the problems.
1. Is it possible to use L1 in both the cases?
2. I have set the kernel arguments like this as per the OpenCL Programming Guide May 2012 Pg 5-13.
__kernel void mykernel( __global uint const * restrict key, //18 uints per work-item
__global uint const * restrict salt , //4 uint for broadcast
. //other args
Is there anything else I need to do in order to cache the data in L1?