1 Reply Latest reply on Jul 13, 2015 9:11 AM by realhet

    USER_SGPR value restricted to 14bits?

    realhet

      Hi,

       

      I've made a small kernel in GCN ASM and decided to launch it with 167million workitems. WorkGroupSize is 256. I calculate the 0 base global_id value as simple as GID = USER_SGPR*256+V0. And just noticed that the value of the USER_SGPR reg (S12 at the moment) can not exceed 16384, and it is AND-ed, so it basically repeats every GID values from 0 to 4million 40 times. I counted the distinct gid values with buffer_atomic_add.

       

      My question is this: Is there a 14 bit limit on the USER_SGPR value? And if so, what's the proper way to get the higher bits?

       

      I don't think I ran this problem in the past, so maybe it's a new/altered functionality. Or maybe it is related to the largest texture size in 1 dimension.

      I'm trying to use the simplest header as possible:

      __kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[0]=a[0]%10; }

      It gives only 2 user elements: IMM_UAV 12, s[4:7];  IMM_CONST_BUFFER 1, s[8:11];    With USER_SGPR = 12

       

      I initialize the kernel like this:

        s_buffer_load_dword  s0, s[8:11], 0x00    //buffer base offset

        s_or_b32      s5, s5, $40000                     //set resource record size

        s_mul_i32     s1, s12, 256                       //!!!!!!!!!!!!!!!!!!!!!!!  s12 is ANDed with 16383, but don't know why

        s_waitcnt     lgkmcnt(0)

        s_add_u32     s4, s4, s0 \ s_addc_u32 s5 ,s5, 0  //s[4:7] = buffer, add base offset to resource

        v_add_i32     v1, vcc, s1, v0                   //!!!!!!!!!!!!!!!!!!!!!!! So, the calculated 0based GID can't exceed 4194304 (=WorkGroupSize*16384)

        alias UAV=s[4:7], LID=v0, GID=v1

       

      I'm using: HD7770 Cape Verde, Cat 14.12

       

      Thank you!

        • Re: USER_SGPR value restricted to 14bits?
          realhet

          I investigated the problem further and got to the solution, that I better involve global_get_id(0) in the ocl kernel:

           

          __kernel __attribute__((reqd_work_group_size(256,1,1))) void main(__global uint* a) { a[get_global_id(0)]=999; }

          This gives me 3 user elements: IMM_UAV 12, s[4:7];  IMM_CONST_BUFFER 1, s[8:11];  MM_CONST_BUFFER 1, s[12:15]; 

          And valid values in USER_SGPR.

           

          Anyways, it's completely OK that USER_SGPR is undefinied when it isn't used in the kernel at all. I just wrongly got used to old habits when it wasn't undefinied in previous drivers.