cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

realhet
Miniboss

USER_SGPR value restricted to 14bits?

Hi,

I've made a small kernel in GCN ASM and decided to launch it with 167million workitems. WorkGroupSize is 256. I calculate the 0 base global_id value as simple as GID = USER_SGPR*256+V0. And just noticed that the value of the USER_SGPR reg (S12 at the moment) can not exceed 16384, and it is AND-ed, so it basically repeats every GID values from 0 to 4million 40 times. I counted the distinct gid values with buffer_atomic_add.

My question is this: Is there a 14 bit limit on the USER_SGPR value? And if so, what's the proper way to get the higher bits?

I don't think I ran this problem in the past, so maybe it's a new/altered functionality. Or maybe it is related to the largest texture size in 1 dimension.

I'm trying to use the simplest header as possible:

__kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[0]=a[0]%10; }

It gives only 2 user elements: IMM_UAV 12, s[4:7];  IMM_CONST_BUFFER 1, s[8:11];    With USER_SGPR = 12

I initialize the kernel like this:

  s_buffer_load_dword  s0, s[8:11], 0x00    //buffer base offset

  s_or_b32      s5, s5, $40000                     //set resource record size

  s_mul_i32     s1, s12, 256                       //!!!!!!!!!!!!!!!!!!!!!!!  s12 is ANDed with 16383, but don't know why

  s_waitcnt     lgkmcnt(0)

  s_add_u32     s4, s4, s0 \ s_addc_u32 s5 ,s5, 0  //s[4:7] = buffer, add base offset to resource

  v_add_i32     v1, vcc, s1, v0                   //!!!!!!!!!!!!!!!!!!!!!!! So, the calculated 0based GID can't exceed 4194304 (=WorkGroupSize*16384)

  alias UAV=s[4:7], LID=v0, GID=v1

I'm using: HD7770 Cape Verde, Cat 14.12

Thank you!

0 Likes
1 Solution
realhet
Miniboss

I investigated the problem further and got to the solution, that I better involve global_get_id(0) in the ocl kernel:

__kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[get_global_id(0)]=999; }

This gives me 3 user elements: IMM_UAV 12, s[4:7];  IMM_CONST_BUFFER 1, s[8:11];  MM_CONST_BUFFER 1, s[12:15]; 

And valid values in USER_SGPR.

Anyways, it's completely OK that USER_SGPR is undefinied when it isn't used in the kernel at all. I just wrongly got used to old habits when it wasn't undefinied in previous drivers.

View solution in original post

0 Likes
1 Reply
realhet
Miniboss

I investigated the problem further and got to the solution, that I better involve global_get_id(0) in the ocl kernel:

__kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[get_global_id(0)]=999; }

This gives me 3 user elements: IMM_UAV 12, s[4:7];  IMM_CONST_BUFFER 1, s[8:11];  MM_CONST_BUFFER 1, s[12:15]; 

And valid values in USER_SGPR.

Anyways, it's completely OK that USER_SGPR is undefinied when it isn't used in the kernel at all. I just wrongly got used to old habits when it wasn't undefinied in previous drivers.

0 Likes