I have a kernel where I process blocks of 16 uchar/ushort and since OpenCL has a ushort16 type I thought it would be great to use those. However the kernel crashes, which I find odd since it works with Intel's OpenCL. It also works if I use vstore16/vload16, the memory is aligned on 4k-boundaries, it's only on the cpu. Using ushort8/uchar8 seems to work.
Have I misunderstood something or is this a bug?
Two simple kernels
kernel void thisOneCrashes(global const uchar16* data1, global ushort16* data2)
data2 = (ushort16)(1);
kernel void thisOneDoesnt(global const uchar* data1, global ushort* data2)
vstore16((ushort16)(1), 0, data2);