I often see a similar effect when projecting a 3D volume onto a 2D image where I sum through the volume to get each 2D pixel.
The time variation (at least mine) occurs while rotating through different angles. As you change angles, the step between sequential 3D volume memory addresses slowly changes from small to large (or v.v.). This can affect both the GPU's memory cache and/or the access pattern of the GPU's memory controllers thus varying the time it takes to access memory.
There is no way around this but you may be able to optimize a bit. If you consider a 3D volume of dimensions DX, DY, and DZ, memory stepping can go from 1 to DX along one rotation access, or maybe DX to DX*DY along another rotation axis etc. Depending on the problem, you might be able to choose a better set of angles.
Thanks, that make sense...., I guess the read call has to wait for my kernel to return before it gets access to the bits? So if my memory access times are a function of the angle, I should be able to see it come back up to speed every 180 degrees around the volume?