Archives Discussions

Meteorhead · ‎07-31-2010

how is it actually stored?

Hi everyone!

I have a simple question which I could not find the answer to by debugging one of my programs.

I have an array of cl_uints, and when given to the kernels I process them as an array of uint4. There is no problem with this. Because I do bit operations, and threads have to share data via __local variables I need to know exactly how the data is stored.

cl_uint a[8] looks like this in host memory:

a[0]-a[1]-a[2]-a[3]-a[4]-a[5]-a[6]-a[7]

When this is processed in a kernel as an array of uint4, I found it looks like this:

a[0].x-a[0].y-a[0].z-a[0].w-a[1].x-a[1].y-a[1].x-a[1].y

This is strange because the sample MersenneTwister and MonteCarloAsian has a vector shift with carryover between the elements and it suggests that vector components are stored just the other way around.

I use this layout in a program, and when I should be leftshifting the entire array (with periodic boundaries) I use the lshift128 which I turned around (becuase original MT lshift128 had bits carryover incorrectly inside a vector) and I handle the meeting of vectors manually. However bit zeros tend to enter the array and they overflow the system if simulation is long enough.

Could someone clarify for me how an array is read by a kernel when it interprets it as an array of vectors?

Illusio · ‎07-31-2010

Endianness handling differs depending on host and device endianness, as well as on the kernel code.(You can specify endianness of data using the __attribute__ flag). The goal being that most OpenCL kernel code(That is, code that uses swizzle notation to access individual vector elements, as opposed to pointer magic) can ignore endianness issues completely.

I'd recommend having a look at "Appendix B - Portability" in the OpenCL spec. It has a fairly thorough explanation of how it works as well as practical examples.

Meteorhead · ‎08-01-2010

Thank you for the tip. I will look into the matter and update what my findings are. My code relies greatly on (to take the above mentioned example) that a[0].w - a[1].x bits are mapped not the same way as on the host, but the way I imagine.

That was why I was surprised that the lshift128 and rshift128 in the examples don't really implement bitshifts in their respective direction with correct carryover, because if you put this vector in an array, they are represented different, than one might think at first.

I do not know why is it zeros that flood the system in my code (because of wrong bit operations it should be random), but I'll look into this matter, if it's caused by this.