cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

drallan
Challenger

Gcn compiler bug, lds pointer arithmetic?

The following seems to be a bug in the gcn compiler, starting with catalyst 12.9, and does not occur on other architectures like Cayman and Barts. The test program fills an array[64] with some data and then reads it back in reverse order. To do this, the compiler computes the offset (63 - gx) to pointer pB where gx >= 0 && < 64. The problem occurs because the new compiler uses the constant value 63 as an offset in the ds_read_b32 instruction, which then causes the register address of ds_read_b32 to be negative!. The old compiler does not use the ds_read_b32 offset option. Several similar expressions cause the same problem.  Unfortunately this is a pretty common 'C' expression.....

                 result = *(pB + (63 - gx));

Just a comment, gcn is still relatively new and being optimized so bugs can happen however, the compiler is often beyond amazing in the code it produces and getting faster. Go compiler team.

__kernel void localpointer_rw(global int *in, global int *out)

{

    int i, gx = get_global_id(0);   //gx goes from 0 to 63

    local int buf[64],*pB;

     buf[gx] = gx;            //assign 64 numbers to all buf locations

     pB = &buf[0];            //make a pointer

     i = *(pB+(63-gx));       //ERROR! reads only when gx == 0 !!

     out[gx] = i;

}

Dissassembly from (good) old compiler

  v_sub_i32      v1, vcc, 0, v0   <------  start:gx in v0; gx = -gx;

  v_lshlrev_b32  v1, 2, v1                 mmultiply by 4

  v_add_i32      v1, vcc, 0x000000fc, v1   add constant offset 252

  v_lshlrev_b32  v2, 2, v0

  ds_write_b32   v2, v0

  s_waitcnt      lgkmcnt(0)

  ds_read_b32    v0, v1                    read

Dissassembly of newer code

   v_lshlrev_b32  v1, 2, v0

  v_sub_i32      v2, vcc, 0, v0   <------   start:gx in v0; gx = -gx;

  ds_write_b32   v1, v0

  v_lshlrev_b32  v0, 2, v2                  multiply by 4

  s_waitcnt     lgmkcnt(0)                 

  ds_read_b32    v0, v0 offset:252          read from offset 252 with negative address register

0 Likes
1 Solution

I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right  people.

View solution in original post

0 Likes
6 Replies
binying
Challenger

Have you try it on the latest? It might be fixed!

0 Likes

Hi Binying,

Thanks for the reply.

I have seen the failure on catalysts 12.8, 12.9, 12.10, 12.11beta8, and 12.11beta11.

However, the problem can also depend on how or what you select when you install drivers. The combination of installing the video drivers and/or not the SDK runtime can make a difference, (at least as of 12.9), including the SDK, as in the default installation, will produce the error.

I did confirm the error occurs with all amdocl(64).dll files up through 12.11beta11.

Where is beta13 ?

drallan

0 Likes
realhet
Miniboss

Do you think ds_read will fail if it gets a negative address? Isa docs didn't mention it. Does it checks negative addr before adding the 16 bit offset?

I think the same data and address vreg (v0, v0) could be wrong. At least with memory operations it is bad. But It's a weak argument since it reads only 1 dword...

side note: Just accidentally found the src_lds_direct broadcasted vector parameter. In case we run out of S reg local constants If I only knew it earlier...

0 Likes

realhet wrote:

Do you think ds_read will fail if it gets a negative address? Isa docs didn't mention it. Does it checks negative addr before adding the 16 bit offset?

I think the same data and address vreg (v0, v0) could be wrong. At least with memory operations it is bad. But It's a weak argument since it reads only 1 dword...

side note: Just accidentally found the src_lds_direct broadcasted vector parameter. In case we run out of S reg local constants If I only knew it earlier...

Hi Realhet,

Good question and not clear. I know the negative address produces the wrong answer = 0. The manual says OOR lds addresses will return the value 0, I don't know how the hardware makes the final address. I assume a negative address is interpreted as a large positive address and goes OOR.

re: src_lds_direct broadcast. Yes, yet another well thought out gcn feature. I wonder how that compares to all threads reading the same address, which would be a bank conflict?

0 Likes

I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right  people.

0 Likes

binying wrote:

I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right  people.

Very good! Thanks. I think this can be fixed fairly easily.

drallan

0 Likes