cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

vadimdi
Journeyman III

global buffer bug?

The following kernel must write L1 value to the global buffer. For example, the starting domain 1x1 (or it may be selected inside kernel by "if_logicalz vaTid.x"..."endif"). We expect to obtain value L1 in global_buffer_ptr[1040387], but, as we can see below, the global_buffer_ptr[3] will be filled.

il_cs_2_0
dcl_num_thread_per_group 16
dcl_literal l1, 1040387, 0, 0, 0
mov r1.x,l1.x
mov g[r1.x],l1
endmain
end

; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(5)
      0  x: MOV         R0.x,  (0x000FE003, 1.457892705e-39f).x     
         y: MOV         R0.y,  0.0f     
         z: MOV         R0.z,  0.0f     
         w: MOV         R0.w,  0.0f     
01 MEM_EXPORT_WRITE: DWORD_PTR[12], R0, ELEM_SIZE(3)
END_OF_PROGRAM

Also if the address is used more than 2047, only the low 11 bits of address are in use.

The relative addressing with any register is correct.

0 Likes
3 Replies
rick_weber
Adept II

The global buffer uses 16-byte addressing. This would mean that if you?e trying to get your data back on the CPU, l1 should appear at element 1,040,387*4 if you?e indexing it as an array of ints.

0 Likes

Off corse, each addressable element of global buffer is 16-byte long.

You can just copy kernel listed above to the ShaderAnalyzer and see the result of disassembly.

Let us change the L1 by the
dcl_literal l1, 2047, 0, 0, 0

and we will see the following code

1 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

Now, change literal by this:
dcl_literal l1, 2048, 0, 0, 0

and the code is:

01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)

And so on,
dcl_literal l1, 2049, 0, 0, 0
01 MEM_EXPORT_WRITE: DWORD_PTR[4], R0, ELEM_SIZE(3)

dcl_literal l1, 4095, 0, 0, 0
01 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

dcl_literal l1, 4096, 0, 0, 0
01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)

etc.


In R600 ISA documentation the microcode format of MEM_EXPORT instruction contains INDEX_GPR field:
"The address in the INDEX_GPR is a DWORD address, no matter how much data exported.

SP supplies a 32-bit integer address offset per pixel (assume zero if no EA export).

Per_pixel DWORD address=
{BASE reg, 6'h0} + clamp({ARRAY_SIZE,6'h0}, (BC increment counter *elemsize + INDEX_GPR + ARRAY_BASE))

So, the problem seems to be in IL compiler?

0 Likes

Thanks for reporting this issue. This has been fixed and should be in the next release.
0 Likes