I have a simple question which I could not find the answer to by debugging one of my programs.
I have an array of cl_uints, and when given to the kernels I process them as an array of uint4. There is no problem with this. Because I do bit operations, and threads have to share data via __local variables I need to know exactly how the data is stored.
cl_uint a looks like this in host memory:
When this is processed in a kernel as an array of uint4, I found it looks like this:
This is strange because the sample MersenneTwister and MonteCarloAsian has a vector shift with carryover between the elements and it suggests that vector components are stored just the other way around.
I use this layout in a program, and when I should be leftshifting the entire array (with periodic boundaries) I use the lshift128 which I turned around (becuase original MT lshift128 had bits carryover incorrectly inside a vector) and I handle the meeting of vectors manually. However bit zeros tend to enter the array and they overflow the system if simulation is long enough.
Could someone clarify for me how an array is read by a kernel when it interprets it as an array of vectors?