I know I have most of this in my other thread, but to specifically call out the bug/make it easier to search for, I'm posting it as its own topic...
vAbsTidFlat will *NOT* take into account CALprogramGrid.gridSize.height at least not in driver version 8.921.2.0
I had read somewhere early on it was better to launch in 2d grids...though that may not be true anymore, but it did make a nice clear seperation of powers of 2, and number of SIMD engines available. (Since I generally use those to figure out what to run).
Anyhow, if you see my other topic you see i was launching 64,1,1 and 32,8,1. Initially I tried calculating vAbsTidFlat myself, and everything worked well. Next, I realized my value, 2048 was 16384 (64*32*8) / 8. So I tried launching in a grid of 64,1,1 and 256,1,1. Everything works again. Nice so I don't have to generate new kernels for different grid sizes!