I had a similarly odd behavior once. I had copied a few bytes more for the buffer than I actually allocated ... this worked on some devices, on others it crashed.
Also, what type is your buffer? "__global buffer" is certainly not the full declaration. Do the definitions match to what the host writes to the buffer?
Alignment is certainly not an issue as the CPU can access odd addresses (it just takes longer) - the GPU does not like it. But you wrote it works on GPU but not on CPU ...
your declaration is wrong:
should be like:
kernal(__global <datatype> buffer)
or kernal(__global int buffer)
or kernal(__global int *buffer)
__global int b is invalid too. global local and constant must be pointers.