I know that sequential memory access is faster than random on any gpu, are there any patterns developed to maximize memory bandwidth ? Will it be usefull to make something like CUDA's coalescing global memory access ? Does LDS devided into a banks like nVidia's shared memory ?