cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

corry
Adept III

Tahiti 2048 threads max?

Ok, so after todays earlier glitch (still not sure if it was the computer or the brain), I'm trying to be extremily careful in testing...however, I can't seem to get Tahiti to run more than 2048 threads.  Its likely there is more I don't understand...

I'm running a program block of 64,1,1 and a grid size of 32,8,1.  Yet, a simple program that looks something like

il_cs_2_0

dcl_num_threads_per_group 64

dcl_uav_id(0)

dcl_literal l0, 0x00000010, 0, 0, 0

imul r1000.x, vAbsTidFlat.x, l0.x

uav_raw_store_id(0) mem, r1000.x, vAbsTidFlat

endmain

end

gets me results showing

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00

.

.

.

ff 7f 00 00 ff 7f 00 00 ff 7f 00 00 ff 7f 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

.

.

.

I made sure to set my uav size to something ludicrous like 1MB just to make sure it was going to read all of it.  No matter what I do though, I get 0-2047. 

I assume I'm doing something wrong, but given that simple of a test case, I don't know what it could be.  I do know that setting it to just run 2048 works, but is abysmaly slow.  Running 64,1,1 32,8 gets me some insane performance, but incorrect results.  Yeah, I'm at a loss here.

Help?

0 Likes
1 Solution

Well, I got to the bottom of this on my own....it *IS* an AMD Driver/Hardware/etc bug

If I substitute in the following code (pardon the transcription errors) for vAbsTidFlat.x...

imul r1000.x, vThreadGrpId.y, (32)

iadd r1000.x, r1000.x, vThreadGrpId.x

imul r1000.x, r1000.x, (64)

imul r1001.x, vTidInGrp.y, (64)

iadd r1001.x, r1001.x, vTidInGrp.x

iadd r1000.x, r1000.x, r1001.x

Where (##) is a literal value specifying what I want to launch the grid/work group sizes, which is an implementation of what the IL Spec says vAbsTidFlat is supposed to be, everything works fine. 

So yeah, vAbsTidFlat is broken in 8.921.2.0.  None of the newer drivers will even run the the kernel with vAbsTidFlat, so who knows if that works...

I sure hope a fix works its way into 12.2, along with being able to run compute shaders...

View solution in original post

0 Likes
3 Replies
corry
Adept III

Well, I tried updating the driver...figured I'd go all the way to the so called 12.3 preview....not sure if thats just mislabled or malicious, but it did recognize all 4 7970s in the machine for crossfire (not that that matters a lick, but it seemed like an improvement, so I ran with it).  Trouble is, the machine locks up when I run any code on any of the cards...uninstalling and reinstalling...still no clue as to why this is happening...

0 Likes

Well, I screwed up what I saw in my output up there...like I said last friday was rough....it is only counting to 0x7ff though or 07 ff...I know, confusing...unfortunately I can't copy and paste....have to do the data moves in my head, and am quite prone to transcription errors!  7f ff would have been 32767, which would also have been odd since I only requested 16384 total threads...Looks like 12.2 also locks up

Back to 8.921.2.0 and finally not locking up the moment I call calCtxIsEventDone, but again, I get 0x0000 to 0x07ff

0 Likes

Well, I got to the bottom of this on my own....it *IS* an AMD Driver/Hardware/etc bug

If I substitute in the following code (pardon the transcription errors) for vAbsTidFlat.x...

imul r1000.x, vThreadGrpId.y, (32)

iadd r1000.x, r1000.x, vThreadGrpId.x

imul r1000.x, r1000.x, (64)

imul r1001.x, vTidInGrp.y, (64)

iadd r1001.x, r1001.x, vTidInGrp.x

iadd r1000.x, r1000.x, r1001.x

Where (##) is a literal value specifying what I want to launch the grid/work group sizes, which is an implementation of what the IL Spec says vAbsTidFlat is supposed to be, everything works fine. 

So yeah, vAbsTidFlat is broken in 8.921.2.0.  None of the newer drivers will even run the the kernel with vAbsTidFlat, so who knows if that works...

I sure hope a fix works its way into 12.2, along with being able to run compute shaders...

0 Likes