3 Replies Latest reply on Feb 6, 2009 2:22 PM by MicahVillmow

    global buffer bug?

    vadimdi

      The following kernel must write L1 value to the global buffer. For example, the starting domain 1x1 (or it may be selected inside kernel by "if_logicalz vaTid.x"..."endif"). We expect to obtain value L1 in global_buffer_ptr[1040387], but, as we can see below, the global_buffer_ptr[3] will be filled.

      il_cs_2_0
      dcl_num_thread_per_group 16
      dcl_literal l1, 1040387, 0, 0, 0
      mov r1.x,l1.x
      mov g[r1.x],l1
      endmain
      end

      ; --------  Disassembly --------------------
      00 ALU: ADDR(32) CNT(5)
            0  x: MOV         R0.x,  (0x000FE003, 1.457892705e-39f).x     
               y: MOV         R0.y,  0.0f     
               z: MOV         R0.z,  0.0f     
               w: MOV         R0.w,  0.0f     
      01 MEM_EXPORT_WRITE: DWORD_PTR[12], R0, ELEM_SIZE(3)
      END_OF_PROGRAM

      Also if the address is used more than 2047, only the low 11 bits of address are in use.

      The relative addressing with any register is correct.

        • global buffer bug?
          rick.weber

          The global buffer uses 16-byte addressing. This would mean that if you?e trying to get your data back on the CPU, l1 should appear at element 1,040,387*4 if you?e indexing it as an array of ints.

            • global buffer bug?
              vadimdi

              Off corse, each addressable element of global buffer is 16-byte long.

              You can just copy kernel listed above to the ShaderAnalyzer and see the result of disassembly.

              Let us change the L1 by the
              dcl_literal l1, 2047, 0, 0, 0

              and we will see the following code

              1 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

              Now, change literal by this:
              dcl_literal l1, 2048, 0, 0, 0

              and the code is:

              01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)

              And so on,
              dcl_literal l1, 2049, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[4], R0, ELEM_SIZE(3)

              dcl_literal l1, 4095, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[8188], R0, ELEM_SIZE(3)

              dcl_literal l1, 4096, 0, 0, 0
              01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)

              etc.


              In R600 ISA documentation the microcode format of MEM_EXPORT instruction contains INDEX_GPR field:
              "The address in the INDEX_GPR is a DWORD address, no matter how much data exported.

              SP supplies a 32-bit integer address offset per pixel (assume zero if no EA export).

              Per_pixel DWORD address=
              {BASE reg, 6'h0} + clamp({ARRAY_SIZE,6'h0}, (BC increment counter *elemsize + INDEX_GPR + ARRAY_BASE))

              So, the problem seems to be in IL compiler?

            • global buffer bug?
              MicahVillmow
              Thanks for reporting this issue. This has been fixed and should be in the next release.