I have a kernel where I accumulate a lot of values with atomics. The values to accumulate are in a 2D neighborhood, and neighboring threads treat similar regions, but with a small random (x,y) shift. and thus there would be a better access pattern if the buffer was tiled.
Using images rather than buffers would enable to have native tiling support. Plus I trust the hardware internal tiling to be quite optimized for 2D local operations.
Unfortunately I found that there was no image atomics support in OpenCL. The functionnality seems to be present in OpenGL compute shaders. The hardware definitely supports the feature according to the ISA.
It would be great if AMD could add an extension to get the feature in OpenCL.