Hi,
I've made a small kernel in GCN ASM and decided to launch it with 167million workitems. WorkGroupSize is 256. I calculate the 0 base global_id value as simple as GID = USER_SGPR*256+V0. And just noticed that the value of the USER_SGPR reg (S12 at the moment) can not exceed 16384, and it is AND-ed, so it basically repeats every GID values from 0 to 4million 40 times. I counted the distinct gid values with buffer_atomic_add.
My question is this: Is there a 14 bit limit on the USER_SGPR value? And if so, what's the proper way to get the higher bits?
I don't think I ran this problem in the past, so maybe it's a new/altered functionality. Or maybe it is related to the largest texture size in 1 dimension.
I'm trying to use the simplest header as possible:
__kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[0]=a[0]%10; }
It gives only 2 user elements: IMM_UAV 12, s[4:7]; IMM_CONST_BUFFER 1, s[8:11]; With USER_SGPR = 12
I initialize the kernel like this:
s_buffer_load_dword s0, s[8:11], 0x00 //buffer base offset
s_or_b32 s5, s5, $40000 //set resource record size
s_mul_i32 s1, s12, 256 //!!!!!!!!!!!!!!!!!!!!!!! s12 is ANDed with 16383, but don't know why
s_waitcnt lgkmcnt(0)
s_add_u32 s4, s4, s0 \ s_addc_u32 s5 ,s5, 0 //s[4:7] = buffer, add base offset to resource
v_add_i32 v1, vcc, s1, v0 //!!!!!!!!!!!!!!!!!!!!!!! So, the calculated 0based GID can't exceed 4194304 (=WorkGroupSize*16384)
alias UAV=s[4:7], LID=v0, GID=v1
I'm using: HD7770 Cape Verde, Cat 14.12
Thank you!