When i've changed one string in kernel (see code attached), encryption/decryption became correct on nVidia GPUs (nForce v195.89, Tesla c1060 and 280gtx) (although i've tested your kernel separately from whole application, using it for processing just random bytes - i had changed just the part responsible for loading data to be processed, not changing anything directly related to OpenCL), and it was always ok on CPU (with ATI's OpenCL, 2 * amd opteron 2435) . However, it fails on Firestream (even sample, when run with -e option, reports 'Failed' on GPU, not mentioning about decryption results).
Btw, Firestream 9270 i'm using has label 'Engineering sample NOT for Qualification' on it. Maybe it's the source of troubles?
unsigned char hiBitSet = (a>127?128:0);/*(a & 0x80);*/