The following seems to be a bug in the gcn compiler, starting with catalyst 12.9, and does not occur on other architectures like Cayman and Barts. The test program fills an array[64] with some data and then reads it back in reverse order. To do this, the compiler computes the offset (63 - gx) to pointer pB where gx >= 0 && < 64. The problem occurs because the new compiler uses the constant value 63 as an offset in the ds_read_b32 instruction, which then causes the register address of ds_read_b32 to be negative!. The old compiler does not use the ds_read_b32 offset option. Several similar expressions cause the same problem. Unfortunately this is a pretty common 'C' expression.....
result = *(pB + (63 - gx));
Just a comment, gcn is still relatively new and being optimized so bugs can happen however, the compiler is often beyond amazing in the code it produces and getting faster. Go compiler team.
__kernel void localpointer_rw(global int *in, global int *out)
{
int i, gx = get_global_id(0); //gx goes from 0 to 63
local int buf[64],*pB;
buf[gx] = gx; //assign 64 numbers to all buf
pB = &buf[0]; //make a pointer
i = *(pB+(63-gx)); //ERROR! reads only when gx == 0 !!
out[gx] = i;
}
Dissassembly from (good) old compiler
v_sub_i32 v1, vcc, 0, v0 <------ start:gx in v0; gx = -gx;
v_lshlrev_b32 v1, 2, v1 mmultiply by 4
v_add_i32 v1, vcc, 0x000000fc, v1 add constant offset 252
v_lshlrev_b32 v2, 2, v0
ds_write_b32 v2, v0
s_waitcnt lgkmcnt(0)
ds_read_b32 v0, v1 read
Dissassembly of newer code
v_lshlrev_b32 v1, 2, v0
v_sub_i32 v2, vcc, 0, v0 <------ start:gx in v0; gx = -gx;
ds_write_b32 v1, v0
v_lshlrev_b32 v0, 2, v2 multiply by 4
s_waitcnt lgmkcnt(0)
ds_read_b32 v0, v0 offset:252 read from offset 252 with negative address register
Solved! Go to Solution.
I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right people.
Have you try it on the latest? It might be fixed!
Hi Binying,
Thanks for the reply.
I have seen the failure on catalysts 12.8, 12.9, 12.10, 12.11beta8, and 12.11beta11.
However, the problem can also depend on how or what you select when you install drivers. The combination of installing the video drivers and/or not the SDK runtime can make a difference, (at least as of 12.9), including the SDK, as in the default installation, will produce the error.
I did confirm the error occurs with all amdocl(64).dll files up through 12.11beta11.
Where is beta13 ?
drallan
Do you think ds_read will fail if it gets a negative address? Isa docs didn't mention it. Does it checks negative addr before adding the 16 bit offset?
I think the same data and address vreg (v0, v0) could be wrong. At least with memory operations it is bad. But It's a weak argument since it reads only 1 dword...
side note: Just accidentally found the src_lds_direct broadcasted vector parameter. In case we run out of S reg local constants If I only knew it earlier...
realhet wrote:
Do you think ds_read will fail if it gets a negative address? Isa docs didn't mention it. Does it checks negative addr before adding the 16 bit offset?
I think the same data and address vreg (v0, v0) could be wrong. At least with memory operations it is bad. But It's a weak argument since it reads only 1 dword...
side note: Just accidentally found the src_lds_direct broadcasted vector parameter. In case we run out of S reg local constants If I only knew it earlier...
Hi Realhet,
Good question and not clear. I know the negative address produces the wrong answer = 0. The manual says OOR lds addresses will return the value 0, I don't know how the hardware makes the final address. I assume a negative address is interpreted as a large positive address and goes OOR.
re: src_lds_direct broadcast. Yes, yet another well thought out gcn feature. I wonder how that compares to all threads reading the same address, which would be a bank conflict?
I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right people.
binying wrote:
I can reproduce this on Win7 with Tahiti and 12.11beta. Just let you know I am forwarding your findings to the right people.
Very good! Thanks. I think this can be fixed fairly easily.
drallan