I would like to suggest that IHV's including AMD should spend some time to better document cross-IHV intra-wavefront/warp gotchas especially given the age of ballot() intrinsics etc. There have been quite a few cases where I have managed to trip myself up when lanes were unexpectedly masked out due to mistaken assumptions on my part (easy to do unfortunately). I think it's a pain that others can avoid in future if there were better guidelines explicitly stated and examples of where things go wrong. Even though I *think* I have a better handle on it all I would still certainly appreciate such a thing to help confirm my understanding of things. A whole worthwhile presentation/paper/pdf/ppt could be made of this for the good of all.
I can only find this non IHV resource that attempts to at least highlight part of the problem more thoroughly: A Digression on Divergence | Tangent Vector
Key points I think that should be stated very strongly (Unless I have them wrong!):
1) You can only assume the results of a ballot() are valid up until the next instruction that may cause divergence
2) You can't assume to know where things will re-converge. As you can only assume when things are converged up until the first possibly divergent instruction, then all bets are off.
3) An an extension of (2) certain structures of loops with conditional statements nested inside, may very easily lead you to make assumptions about when things have converged but due to compiler and hardware behaviour they may well not of. Leading to lanes being masked out you don't expect. For example entering a loop that is 'converged at the top', but may have diverged by the time it reaches the bottom - so that assuming it is still converged 'at the top' the next iteration may well not be true.
4) The behaviour of atomics and memory accesses regards divergence has never seemingly been made clear. For example I am not currently certain if an atomic operation in one lane may cause others to diverge and be masked out within the same wavefront/warp. Also I am not sure if this is something where current behaviour could change one day.
5) In library code it's best to assume that any lanes could be currently masked out in entry to the library function.
6) I don't know if it would be possible some day to have some sort of assert_converged() in the shader languages that developers could put in and cause a compile error if it wasn't the case at all?
I would really love some worked examples of 'this is safe', 'this is not safe, you may think this is safe but it is not, and you can fix it like this'. There seems to be a lot of CUDA code that comes up via google that could very easily mislead people going forward - as I have seen examples that rely on behaviour that I believe is not safe for even CUDA let alone cross IHV GL/Vulkan!
It's definitely a case where even some of the 'obvious' things may not sink in fully unless they are really hammered into developers minds.
There is a danger in the future of lots of games/apps shipping with code that has exactly these problems - that may seem to work but then break in a driver/compiler update etc. So I think better educational resource here would save a lot of headache for both the IHV's and developers. As it could be one area where all those nice new 'slim' vulkan drivers get patched into a game specific mess to 'fix games' their latest updates 'broke' 😞