Endianness handling differs depending on host and device endianness, as well as on the kernel code.(You can specify endianness of data using the __attribute__ flag). The goal being that most OpenCL kernel code(That is, code that uses swizzle notation to access individual vector elements, as opposed to pointer magic) can ignore endianness issues completely.
I'd recommend having a look at "Appendix B - Portability" in the OpenCL spec. It has a fairly thorough explanation of how it works as well as practical examples.
Thank you for the tip. I will look into the matter and update what my findings are. My code relies greatly on (to take the above mentioned example) that a.w - a.x bits are mapped not the same way as on the host, but the way I imagine.
That was why I was surprised that the lshift128 and rshift128 in the examples don't really implement bitshifts in their respective direction with correct carryover, because if you put this vector in an array, they are represented different, than one might think at first.
I do not know why is it zeros that flood the system in my code (because of wrong bit operations it should be random), but I'll look into this matter, if it's caused by this.