Well, I tried updating the driver...figured I'd go all the way to the so called 12.3 preview....not sure if thats just mislabled or malicious, but it did recognize all 4 7970s in the machine for crossfire (not that that matters a lick, but it seemed like an improvement, so I ran with it). Trouble is, the machine locks up when I run any code on any of the cards...uninstalling and reinstalling...still no clue as to why this is happening...
Well, I screwed up what I saw in my output up there...like I said last friday was rough....it is only counting to 0x7ff though or 07 ff...I know, confusing...unfortunately I can't copy and paste....have to do the data moves in my head, and am quite prone to transcription errors! 7f ff would have been 32767, which would also have been odd since I only requested 16384 total threads...Looks like 12.2 also locks up
Back to 8.921.2.0 and finally not locking up the moment I call calCtxIsEventDone, but again, I get 0x0000 to 0x07ff
Well, I got to the bottom of this on my own....it *IS* an AMD Driver/Hardware/etc bug
If I substitute in the following code (pardon the transcription errors) for vAbsTidFlat.x...
imul r1000.x, vThreadGrpId.y, (32)
iadd r1000.x, r1000.x, vThreadGrpId.x
imul r1000.x, r1000.x, (64)
imul r1001.x, vTidInGrp.y, (64)
iadd r1001.x, r1001.x, vTidInGrp.x
iadd r1000.x, r1000.x, r1001.x
Where (##) is a literal value specifying what I want to launch the grid/work group sizes, which is an implementation of what the IL Spec says vAbsTidFlat is supposed to be, everything works fine.
So yeah, vAbsTidFlat is broken in 8.921.2.0. None of the newer drivers will even run the the kernel with vAbsTidFlat, so who knows if that works...
I sure hope a fix works its way into 12.2, along with being able to run compute shaders...