Ok, so after todays earlier glitch (still not sure if it was the computer or the brain), I'm trying to be extremily careful in testing...however, I can't seem to get Tahiti to run more than 2048 threads. Its likely there is more I don't understand...
I'm running a program block of 64,1,1 and a grid size of 32,8,1. Yet, a simple program that looks something like
il_cs_2_0
dcl_num_threads_per_group 64
dcl_uav_id(0)
dcl_literal l0, 0x00000010, 0, 0, 0
imul r1000.x, vAbsTidFlat.x, l0.x
uav_raw_store_id(0) mem, r1000.x, vAbsTidFlat
endmain
end
gets me results showing
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
.
.
.
ff 7f 00 00 ff 7f 00 00 ff 7f 00 00 ff 7f 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.
.
.
I made sure to set my uav size to something ludicrous like 1MB just to make sure it was going to read all of it. No matter what I do though, I get 0-2047.
I assume I'm doing something wrong, but given that simple of a test case, I don't know what it could be. I do know that setting it to just run 2048 works, but is abysmaly slow. Running 64,1,1 32,8 gets me some insane performance, but incorrect results. Yeah, I'm at a loss here.
Help?
Solved! Go to Solution.
Well, I got to the bottom of this on my own....it *IS* an AMD Driver/Hardware/etc bug
If I substitute in the following code (pardon the transcription errors) for vAbsTidFlat.x...
imul r1000.x, vThreadGrpId.y, (32)
iadd r1000.x, r1000.x, vThreadGrpId.x
imul r1000.x, r1000.x, (64)
imul r1001.x, vTidInGrp.y, (64)
iadd r1001.x, r1001.x, vTidInGrp.x
iadd r1000.x, r1000.x, r1001.x
Where (##) is a literal value specifying what I want to launch the grid/work group sizes, which is an implementation of what the IL Spec says vAbsTidFlat is supposed to be, everything works fine.
So yeah, vAbsTidFlat is broken in 8.921.2.0. None of the newer drivers will even run the the kernel with vAbsTidFlat, so who knows if that works...
I sure hope a fix works its way into 12.2, along with being able to run compute shaders...
Well, I tried updating the driver...figured I'd go all the way to the so called 12.3 preview....not sure if thats just mislabled or malicious, but it did recognize all 4 7970s in the machine for crossfire (not that that matters a lick, but it seemed like an improvement, so I ran with it). Trouble is, the machine locks up when I run any code on any of the cards...uninstalling and reinstalling...still no clue as to why this is happening...
Well, I screwed up what I saw in my output up there...like I said last friday was rough....it is only counting to 0x7ff though or 07 ff...I know, confusing...unfortunately I can't copy and paste....have to do the data moves in my head, and am quite prone to transcription errors! 7f ff would have been 32767, which would also have been odd since I only requested 16384 total threads...Looks like 12.2 also locks up
Back to 8.921.2.0 and finally not locking up the moment I call calCtxIsEventDone, but again, I get 0x0000 to 0x07ff
Well, I got to the bottom of this on my own....it *IS* an AMD Driver/Hardware/etc bug
If I substitute in the following code (pardon the transcription errors) for vAbsTidFlat.x...
imul r1000.x, vThreadGrpId.y, (32)
iadd r1000.x, r1000.x, vThreadGrpId.x
imul r1000.x, r1000.x, (64)
imul r1001.x, vTidInGrp.y, (64)
iadd r1001.x, r1001.x, vTidInGrp.x
iadd r1000.x, r1000.x, r1001.x
Where (##) is a literal value specifying what I want to launch the grid/work group sizes, which is an implementation of what the IL Spec says vAbsTidFlat is supposed to be, everything works fine.
So yeah, vAbsTidFlat is broken in 8.921.2.0. None of the newer drivers will even run the the kernel with vAbsTidFlat, so who knows if that works...
I sure hope a fix works its way into 12.2, along with being able to run compute shaders...