10 Replies Latest reply on Jul 25, 2013 12:00 AM by drallan

    DS_WRITE GCN instruction

    balidani

      Hello!

       

      I've been trying to write GCN ISA assembly code by hand and I just can't get the "DS_" instructions to work.

      The docs said that the address shouldn't be the same in all threads, because it causes conflicts.

      I tried to load the global id into the address register so there is no conflict, but it didn't work.

      I also tried to initialize m0, as the doc says.

       

      Here is what I tried:

       

      ; Initially v2 contains the global id

      v_mov_b32       v6, v2

      v_mul_i32_i24   v6, 4, v6

      v_mov_b32       v7, 99

      v_mov_b32       b8, 0

       

      s_mov_b32       m0, 0xFFFFFFFF


      ; ds_write's operands: (vdst) (addr) (data0) (data1)

      ; v5 is just a placeholder, it shouldn't be used I think

      v_mov_b32       v5, 0

      ds_write_b32    v5, v6, v7, v7

      ds_read_b32     v8, v6, v5, v5

       

      I tried many variations of the above code, but in the end v8 always remains 0.

      Does anybody know what I'm doing wrong?

       

      Thanks in advance!

        • Re: DS_WRITE GCN instruction
          realhet

          Hi,

          Same ds address in all threads -> It's a bank conflict. It's not bad but slow. There are 32 banks mapped to the lowest bits of the dw offset.

          After a ds_write you have to s_wait expcnt to ensure that the used registers are free again. If ds_write can't do it immediately then it still holds the data in the regs.

          After a ds_read -> Use s_wait lgkmcnt ! It will wait until you have the requested data in the dst register.

          I'm not sure with the params, maybe you should disassemble the binary and ensure if the params are ok not. (ds_r/w needs only 2 params)

            • Re: Re: DS_WRITE GCN instruction
              balidani

              Hello!

               

              Thanks for your reply! After I posted I realized I need to wait after the read, but I didn't know I also have to wait after write.

              The params are different because in the assembler I use I have to pass all operands that the DS format takes (vdst, addr, data0, data1). It's not that flexible yet

               

              Unfortunately it still doesn't work. Here is the changed assembly:

               

              ; Initially v2 contains the global id

              v_mov_b32      v6, v2

              v_mul_i32_i24  v6, 32, v6

              v_add_i32      v6, vcc, 1024, v6

              v_mov_b32      v7, 99

              v_mov_b32      v4, 1

               

              s_mov_b32      m0, 0xFFFFFFFF

               

              ; ds_write's operands: (vdst) (addr) (data0) (data1)

              ; v5 is just a placeholder, it shouldn't be used I think

              v_mov_b32      v5, 0

               

              ds_write_b32    v5, v6, v7, v5

              s_waitcnt      expcnt(0)

              ds_read_b32    v4, v6, v5, v5

              s_waitcnt      lgkmcnt(0)

               

              v_mov_b32      v0, v4

               

              I attached the full ISA file too. Thanks for the help!

               

              Regards,

              Daniel

                • Re: DS_WRITE GCN instruction
                  realhet

                  Kick your code around until you see this parameter order in the disassembled isa:

                   

                  //puts 'value' into LDS, then reads it back into 'result'. 'addr' contains get_local_id*4

                    v_lshlrev_b32 addr, 2, lid 

                    s_mov_b32 m0, $FFFF

                    ds_write_b32  addr, value

                    ds_read_b32   result, addr offset:4 //read from a different location

                    s_waitcnt     lgkmcnt(0) 

                   

                  This works well.

                  If the 'value' vector contains (0,1,2,3,4,5,....)

                  Then the corresponding 'result' vector will be (1,2,3,4,5,6,...)

                  (At every 63rd lane there will be garbage.)

                   

                  And don't forget to declare LDS size!

                    • Re: Re: DS_WRITE GCN instruction
                      balidani

                      Thanks!

                       

                      I think the last issue I have is not setting the LDS size. Here is the assembly now:

                       

                      ; User code starts here

                      ; v2 == global_id

                       

                      ; Set value

                      #define value v3

                      v_mov_b32 value, 99

                       

                      ; Set address

                      #define addr v4

                      #define lid v5

                      v_mov_b32 lid, v2

                      v_lshlrev_b32 addr, 2, lid

                       

                      ; Set m0

                      s_mov_b32 m0, 0xFFFF

                       

                      ; LDS write/read

                      #define NULL v6

                      #define result v7

                      v_mov_b32 v6, 0

                       

                      ds_write_b32 NULL, addr, value, NULL

                      ds_read_b32 result, addr, NULL, NULL offset0:4

                      s_waitcnt lgkmcnt(0)

                       

                      v_mov_b32       v0, result

                       

                      About the LDS size: is it 0 by default? I can't find which byte corresponds to it in the ATI CAL comment section of the ELF. I tried looking in your code too, but I couldn't understand this part:

                       

                      //set prog3 notes:

                        with AOptions do begin

                          SetCalNote($80001041,numvgprs);

                          SetCalNote($80001042,numsgprs);

                      {    SetCalNote($8000001C,NumThreadPerGroup.x);

                          SetCalNote($8000001D,NumThreadPerGroup.y);

                          SetCalNote($8000001E,NumThreadPerGroup.z);  not needed because of __attribute__((reqd_work_group_size}

                          SetCalNote($80000082,ldsSizeBytes);

                          //compute_pgm_rsrc2

                          SetCalNote($00002e13,(ldsSizeBytes+255)shr 8 shl 15,$FFF07FFF{and mask}); //lds size {256byte granularity}

                          SetCalNote($00002e13,1 shl 7,$FFFFFF7F);   //tgid_x_en=1

                        end;

                       

                      Could you tell me where the LDS size is declared? I found the VGPR and SGPR numbers, but the LDS is harder to spot just by looking at values.

                       

                      Thanks again for your help, I really appreciate it. Also, I'm really sad that there seems to be no documentation about this.

                      (There was this thread: http://devgurus.amd.com/thread/166955, which contains a PDF, but ctrl+f-ing for "LDS" doesn't yield any results)

                       

                      Regards,

                      Daniel

                    • Re: DS_WRITE GCN instruction
                      jeff_golds

                      You don't need any waits for LDS writes if your workgroup size is <= wavefront size.  If your workgroup size is > wavefront size, then you need to wait for the LDS op to complete and you also need a barrier to make sure no wavefronts in the workgroup move forward until all wavefronts in the workgroup have completed their LDS writes.  But, yes, you need to specify the LDS size otherwise your operations will be clamped/discarded.

                       

                      Also, you don't need to wait on exports for LDS writes, that only applies to global memory.  Just use "lgkmcnt(desired_cnt)".

                        • Re: DS_WRITE GCN instruction
                          realhet

                          Thank you for clarifying!

                           

                          I did a test that proved that the ds_write (LDS) instruction picks out the address and data values immediately from the regs, and it's no problem if I overwrite them.

                          But with gds_write it is a must that you don't touch the registers until expcnt.

                           

                            v_lshlrev_b32 addr, 2, gid

                            s_mov_b32 m0, $FFFF

                            ds_write_b32  addr, value gds

                            s_waitcnt expcnt(0)                     //<--------- if this is not here

                            v_mov_b32 value,1234                 //then this will alter the gds_write

                            ds_read_b32   result, addr gds 

                            s_waitcnt     lgkmcnt(0)

                            uavWrite(1,gid,result)

                           

                          How come that they are (lds and gds) working differently? I thought they are almost the same hardware elements.

                            • Re: DS_WRITE GCN instruction
                              drallan

                              I did a test that proved that the ds_write (LDS) instruction picks out the address and data values immediately from the regs, and it's no problem if I overwrite them.

                              But with gds_write it is a must that you don't touch the registers until expcnt.

                              This is a good point, especially the address registers because it was not obvious (to me) that the address is being 'exported'.

                              But it is.

                              How come that they are (lds and gds) working differently? I thought they are almost the same hardware elements.

                              I think anything that goes out of the compute unit must go through the export unit, which does not immediately read the instruction's data and address registers.