I have an OpenCL image that is broken into tiles of 64x64 pixels. I am designing a kernel to run through all tiles and process the pixels. Target is AMD GCN.
Currently, I process the tiles in raster order: left to right, top to bottom.
Is there a better way of organizing the tiles to maximize use of image cache?
For example, I thought about clockwise strips, starting from the origin:
1 2 9 10
4 3 8 11
5 6 7 12
. . . 13