So, I am the developer of the game/engine Tesseract (http://tesseract.gg)
The basic background info to understand: it uses a tiled deferred rendering setup using OpenGL Core (3.0+) profile. The tile shader can batch up to 8 lights at once, plus a sunlight with indirect light sampling from a cascaded 3D texture.
Now up to Catalyst 13.4 or so (the last version I had on my laptop with a 7340), everything was working fine. All other platforms, including, Nvidia, Intel, on both Windows and Linux were working fine and are still working fine.
Then a user running Catalyst 13.9 on a 7770 reported massive slow-downs. So I upgraded to 13.9 on my 7340 laptop, and the same thing occurred there as well. I tracked this down and found a significant latency per frame entirely on the CPU (20-30ms!). I've boiled it down to the following minimal necessary conditions to cause it:
Create 3D texture (format seems irrelevant, but RGBA8 for the moment, default size 32^3). Bind it to a framebuffer. No rendering to that texture seems to even be necessary. Use a fragment shader that samples that texture - doesn't need to do anything else inside it. Issue a draw call.
It will cause a CPU stall dependent on the size of the texture, larger texture is a larger stall, but the stall is entirely on the CPU, as the GPU timers show no increase regardless of texture size. Just a few switches is enough to tank the framerate to unplayable.
This has destroyed the usefulness of my tiled rendering setup on AMD GPUs, and I have found no viable workaround other than to not use tiled rendering at all. If I push the access to this 3D texture into a single fullscreen pass, it kind of works, but it means more superfluous accesses to the g-buffer/lighting passes and pushes up rendering times even more than I would like. Upon further inspection, while I've found that mitigated a lot of the cost, there does seem to be some significant latency (not quite 30ms, but up to maybe 5-7ms on the CPU) that is still around from changing the tile shader only a few tens of times per frame at most, which did not seem to exist on prior Catalysts and is not present in on any other GPU vendors on try on...
It would be nice if the 3D texture stall could be fixed at least, as a 30ms latency there for switching a shader a few times does not seem right. Though even nicer if the overall latency of changing shaders that access render target textures could be fixed in general...