cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

realhet
Miniboss

uav_raw_store broken(?) since catalyst 11.12 on the HD7970

Hi!

I have this very simple kernel to raise this error, please help and tell what I do wrong!

il_cs_2_0

dcl_num_thread_per_group 64,1,1

dcl_cb cb0[2]

dcl_raw_uav_id(0)

mov r0.xy, cb0[0].xy

uav_raw_store_id(0) mem.x, r0.x, r0.y

endmain

end

The above is working fine with

- catalyst 11.12 on the win7/64bit (the first drivet for 7970),

- catalyst 12.1 on Ubuntu/32bit (although it shows a transparent icon, that this is unsupported hardware, but even a complex kernel works perfect)

But thats an error to call this on newer drivers.

The error happens when I use the UAV: Every setup instruction runs without an error, but after runprogramgrid() the kernel freezes and the watchdog resets the gpu.

I've tried several combinations for the UAV: pinned/local/remote, 1D or linear, revert to componentsize=4 instead of 1, different uav index, but not helped at all.

Here's the ISA disasm, maybe contains useful info:

shader main

  asic(SI_ASIC)

  type(CS)

  s_buffer_load_dwordx2  s[0:1], s[8:11], 0x00              // 00000000: C2400900

  s_waitcnt     lgkmcnt(0)                                  // 00000004: BF8C007F

  v_mov_b32     v0, s0                                      // 00000008: 7E000200

  v_mov_b32     v1, s1                                      // 0000000C: 7E020201

  tbuffer_store_format_x  v1, v0, s[4:7], 0 offen format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_FLOAT] // 00000010: EBA41000 80010100

  s_endpgm                                                  // 00000018: BF810000

end

; ----------------- CS Data ------------------------

codeLenInByte        = 28;Bytes

userElementCount     = 2;

;  userElements[0]    = IMM_UAV, 0, s[4:7]

;  userElements[1]    = IMM_CONST_BUFFER, 0, s[8:11]

extUserElementCount  = 0;

NumVgprs             = 3;

NumSgprs             = 13;

Please help solving this weird thing. This uav stuff works on all cards, only the 79xx with the newest driver have this problem.

Thanks for your answers!

0 Likes
5 Replies

If you have any problems with IL/CAL, they most likely will not be fixed as CAL support has been deprecated. The only suggestion is to use the IL in the same manner as the IL that OpenCL generates.

0 Likes

I just hope somebody knows the trick already, so I don't have to debug how opencl does it.

0 Likes

Further investigating the problem I've found out this:

The kernel freezes exactly at the location where the UAV is accessed. ( tbuffer_store_format_x  v1, v0, s[4:7]... )

When I give the gpu a code without this instruction, the kernel will finish without any errors. Also a singe s_endpgm will do.

Then I've got an insane idea and tried to write into the constant buffer, not te UAV. ( tbuffer_store_format_x  v1, v0, s[8:11]... )

This kernel did not freezed, then I looked at the values of the constant buffer on the CPU side, and noticed that the tbuffer_store operation succeeded.

Now we have this funny (I LOLd painfully when found it ) situation that having a corrupt UAV (s[4:7]) which is freezing the gpu when accessed, but also we have a Constant Buffer which is now a Read/Write buffer and can be used to replace the functionality of the broken UAV.

And this 'feature' has been introduced right after the first releaded drivers for the 7xxx. (win11.12 and linux12.1 are ok)

Never thought that some day I'll have to write into a readonly buffer on purpose.

(I'm not asking for a fix, I understand this is deprecated like hell, but please no more funny weirdness in the next drivers)

0 Likes

realhet,

The problem is that raw_uav's are not the prefered approach on SI, the prefered approach is typeless UAV's. Not only are they more flexible, but you can have up to 256 of them and you can have read_only and private ones for performance benefit.

0 Likes

I just tried it as you have suggested using the new dcl_typeless_uav_id(0) and uav_store_id(0). It's still compiled to the tbuffer_store isa instruction but this time it became a 4 component _format_xyzw write. Unfortunately the gpu freezes when touching the uav's resource constant s[4:7]. This constant is changed since Catalyst 11.12, I guess it's something related to Cal's resource management (which is frozen, I know). Well, it really seems like I gotta switch to OpenCl sooner or later.

Thanks for help anyways!

0 Likes