3 Replies Latest reply on Feb 6, 2009 2:22 PM by MicahVillmow

    global buffer bug?


      The following kernel must write L1 value to the global buffer. For example, the starting domain 1x1 (or it may be selected inside kernel by "if_logicalz vaTid.x"..."endif"). We expect to obtain value L1 in global_buffer_ptr[1040387], but, as we can see below, the global_buffer_ptr[3] will be filled.

      dcl_num_thread_per_group 16
      dcl_literal l1, 1040387, 0, 0, 0
      mov r1.x,l1.x
      mov g[r1.x],l1

      ; --------  Disassembly --------------------
      00 ALU: ADDR(32) CNT(5)
            0  x: MOV         R0.x,  (0x000FE003, 1.457892705e-39f).x     
               y: MOV         R0.y,  0.0f     
               z: MOV         R0.z,  0.0f     
               w: MOV         R0.w,  0.0f     

      Also if the address is used more than 2047, only the low 11 bits of address are in use.

      The relative addressing with any register is correct.

        • global buffer bug?

          The global buffer uses 16-byte addressing. This would mean that if you?e trying to get your data back on the CPU, l1 should appear at element 1,040,387*4 if you?e indexing it as an array of ints.

            • global buffer bug?

              Off corse, each addressable element of global buffer is 16-byte long.

              You can just copy kernel listed above to the ShaderAnalyzer and see the result of disassembly.

              Let us change the L1 by the
              dcl_literal l1, 2047, 0, 0, 0

              and we will see the following code

              1 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

              Now, change literal by this:
              dcl_literal l1, 2048, 0, 0, 0

              and the code is:

              01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)

              And so on,
              dcl_literal l1, 2049, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[4], R0, ELEM_SIZE(3)

              dcl_literal l1, 4095, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

              dcl_literal l1, 4096, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)


              In R600 ISA documentation the microcode format of MEM_EXPORT instruction contains INDEX_GPR field:
              "The address in the INDEX_GPR is a DWORD address, no matter how much data exported.

              SP supplies a 32-bit integer address offset per pixel (assume zero if no EA export).

              Per_pixel DWORD address=
              {BASE reg, 6'h0} + clamp({ARRAY_SIZE,6'h0}, (BC increment counter *elemsize + INDEX_GPR + ARRAY_BASE))

              So, the problem seems to be in IL compiler?

            • global buffer bug?
              Thanks for reporting this issue. This has been fixed and should be in the next release.