2 Replies Latest reply on Aug 10, 2010 11:58 PM by malcolm3141

    Is lds memory byte-addressable?



      I wrote the IL kernel below which scales an input in uav0 by 2 and writes the result back to uav0. It works fine. However, when I uncomment the 2 lines that use lds I get wrong results. What could be wrong? Is lds memory byte-addressable? Could this be a consistency/coherence problem?

      Thank you.


      "dcl_max_thread_per_group 256\n"
      "dcl_lds_id(1) 32768\n"
      "dcl_cb cb0[2]\n"                          
      "dcl_literal l0, 4, 4, 4, 4\n"
      "dcl_literal l1, 2, 2, 2, 2\n"
      "mov r0, vTidInGrp.x\n"                    
      "mov r1, r0\n"                             
      "imul r2, r1, l0\n"                        
      "mov r3, cb0[0]\n"                         
      "iadd r4, r3, r2\n"                        
      "uav_raw_load_id(0) r5, r4\n"              
      "imul r6, r5, l1\n"                        
      //"lds_store_id(1) r2, r6\n"
      //"lds_load_id(1) r6, r2\n"
      "mov r7, cb0[1]\n"                         
      "iadd r8, r7, r2\n"                        
      "uav_raw_store_id(0) mem.xyzw, r8, r6\n"   

        • Is lds memory byte-addressable?

          LDS addresses are byte addresses where the lowest 2 bits are always 0. i.e. valid LDS addresses are 0, 4, 8, 12, 16 etc.

          So if you want to store a float (which is 32 bits), you would store at address 4, for example. But if you want to store a byte, you have to find the 4-byte bucket in LDS that is the destination, then "merge" the byte you want to store into the 4 bytes you've fetched.


            • Is lds memory byte-addressable?

              Looking at the Evergreen ISA doc, I can see that the hardware does have support for bytewise access to LDS memory. However it is not exposed in IL yet as far as I know. At present LDS access must be dword aligned much like UAV access. The current exception to the rule seems to be an arena UAV which allows byte and short access. You can only define one of them and the access is rather slow from what I am told.