I'm not an expert on how the hardware works but If the slowdown is significantly more than the 2-3x slowdown you get from spinlocking or k-buffers, then maybe not. But could be potentially faster than or close to the same speed as native atomics, which would be a huge win. It would make life a hell of a lot easier for developers either way. If I'm right about only in the event of concurrent writes to the same pixel, I doubt you'd even notice it - especially in the context of an emulator - it would just be a memory fence for that pixel quad or wavefront. I think in the case of programmable blending for emulators which I know nothing about it's also a case of everybody gets it or nobody gets it. It would be nice to have the option.