I have this very simple kernel to raise this error, please help and tell what I do wrong!
mov r0.xy, cb0.xy
uav_raw_store_id(0) mem.x, r0.x, r0.y
The above is working fine with
- catalyst 11.12 on the win7/64bit (the first drivet for 7970),
- catalyst 12.1 on Ubuntu/32bit (although it shows a transparent icon, that this is unsupported hardware, but even a complex kernel works perfect)
But thats an error to call this on newer drivers.
The error happens when I use the UAV: Every setup instruction runs without an error, but after runprogramgrid() the kernel freezes and the watchdog resets the gpu.
I've tried several combinations for the UAV: pinned/local/remote, 1D or linear, revert to componentsize=4 instead of 1, different uav index, but not helped at all.
Here's the ISA disasm, maybe contains useful info:
s_buffer_load_dwordx2 s[0:1], s[8:11], 0x00 // 00000000: C2400900
s_waitcnt lgkmcnt(0) // 00000004: BF8C007F
v_mov_b32 v0, s0 // 00000008: 7E000200
v_mov_b32 v1, s1 // 0000000C: 7E020201
tbuffer_store_format_x v1, v0, s[4:7], 0 offen format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_FLOAT] // 00000010: EBA41000 80010100
s_endpgm // 00000018: BF810000
; ----------------- CS Data ------------------------
codeLenInByte = 28;Bytes
userElementCount = 2;
; userElements = IMM_UAV, 0, s[4:7]
; userElements = IMM_CONST_BUFFER, 0, s[8:11]
extUserElementCount = 0;
NumVgprs = 3;
NumSgprs = 13;
Please help solving this weird thing. This uav stuff works on all cards, only the 79xx with the newest driver have this problem.
Thanks for your answers!
If you have any problems with IL/CAL, they most likely will not be fixed as CAL support has been deprecated. The only suggestion is to use the IL in the same manner as the IL that OpenCL generates.
Further investigating the problem I've found out this:
The kernel freezes exactly at the location where the UAV is accessed. ( tbuffer_store_format_x v1, v0, s[4:7]... )
When I give the gpu a code without this instruction, the kernel will finish without any errors. Also a singe s_endpgm will do.
Then I've got an insane idea and tried to write into the constant buffer, not te UAV. ( tbuffer_store_format_x v1, v0, s[8:11]... )
This kernel did not freezed, then I looked at the values of the constant buffer on the CPU side, and noticed that the tbuffer_store operation succeeded.
Now we have this funny (I LOLd painfully when found it ) situation that having a corrupt UAV (s[4:7]) which is freezing the gpu when accessed, but also we have a Constant Buffer which is now a Read/Write buffer and can be used to replace the functionality of the broken UAV.
And this 'feature' has been introduced right after the first releaded drivers for the 7xxx. (win11.12 and linux12.1 are ok)
Never thought that some day I'll have to write into a readonly buffer on purpose.
(I'm not asking for a fix, I understand this is deprecated like hell, but please no more funny weirdness in the next drivers)
The problem is that raw_uav's are not the prefered approach on SI, the prefered approach is typeless UAV's. Not only are they more flexible, but you can have up to 256 of them and you can have read_only and private ones for performance benefit.
I just tried it as you have suggested using the new dcl_typeless_uav_id(0) and uav_store_id(0). It's still compiled to the tbuffer_store isa instruction but this time it became a 4 component _format_xyzw write. Unfortunately the gpu freezes when touching the uav's resource constant s[4:7]. This constant is changed since Catalyst 11.12, I guess it's something related to Cal's resource management (which is frozen, I know). Well, it really seems like I gotta switch to OpenCl sooner or later.
Thanks for help anyways!