We have recently fixed a hang issue internally that should show up in one of the upcoming catalyst releases. If you have a test case we can try, then we can confirm if this has fixed the issue or not. If not, then we can work on getting it fixed.
I'm trying to narrow down the problem now...but it takes a reboot every time I fail to find the problem. All I've done so far is set my loop counter to 1, and set my grid/block sizes to 64,1,1 and 1,1,1. I did the second part because we found that one of the kernels was generating an excessive number of loads, and I mean excessive. Rather than look for a big block of v_load_dword or whatrver the actual instruction name is, I figured if I ran a minimal number of threads it might run in a reasonable amount of time if that were the problem. It should take tenths of a second or less with a loop counter of 1, I let it sit for 2 minutes before rebooting.
I'm going to try to remove the loop and see if it runs. I'll post up here if I find a simple test case, but I get the feeling this is one of those well, if a and b and !d ^ g+5+the middle initial of your fathers brothers nephews cousins former roommate | 42, then there will be a problem
can you see if these two instructions follow each other in the ISA?
v_readlane_b32 s17, v4, 0
s_load_dwordx4 s[28:31], s[2:3], s17
Where the result of v_readlane_b32 is the third argument of s_load_dwordx4?
Not exactly, we have the v_readlane followed by the s_load_dword, but its like this...
v_readlane_b32 s0, v3, 0
v_readlane_b32 s1, v3, 1
s_load_dwordx4 s[0:3], s[0:1], 0x38
So it seems s0 and s1 are being used in the s_load_dwordx4, but not in the 3rd parameter.
Also of note, I stripped my program down on the 12.2 drivers to just writing 8 32 bit 0xFFFFFFFF's to the first 32 bytes of ram. That alone locks it up.
Disassembly is literally this:
v_mov_b32 v0, -1
v_mov_b32 v1, -1
v_mov_b32 v2, -1
v_mov_b32 v3, -1
tbuffer_store_format_xyzw v[0:3], v0, s[4:7], 0 format:[BUF_DATA_FORMAT_32_32_32_32,BUF_NUM_FORMAT_FLOAT]
tbuffer_store_format_xyzw v[0:3], v0, s[4:7], 0 offset:16 format:[BUF_DATA_FORMAT_32_32_32_32,BUF_NUM_FORMAT_FLOAT]
So even if the loop issue was fixed in 12.2, we still can't seem to run literally anything on the 12.2 drivers. I hope it doesn't try to interpret the values when it stores since for some reason the integers are going out as floats...but I don't think that should hang the card right?