4 Replies Latest reply on Jan 17, 2011 2:42 PM by MicahVillmow

    burst writing no longer working?

    sgratton
      Seems to be a problem from cat 10.7 onwards

       

      Hi there,

       

      After trying out AMD Stream some time ago, with the release of the 6900 cards I thought I'd give it another go.   One issue in getting good memory performance with CAL then was the absence of burst reading (see link here).  Having bought a new card and installed the latest SDK (2.3) and drivers (10.12), I was surprised to see that not even burst writing seems to occur now in both linux and vista (64 bit).  For example if one runs the export_burst_perf sample and prints out the il (export_burst_perf -p) and then the isa (export_burst_perf -a), it appears that the il is written to give burst writes but that the isa doesn't do this.  For example...

       

      il_cs_2_0
      dcl_cb cb0[1]
      dcl_num_thread_per_group 64
      itof r0.z, vaTid0.x
      div r0.y, r0.z, cb0[0].x
      mod r0.x, r0.z, cb0[0].x
      flr r0, r0
      mul r0.x, r0.x, cb0[0].z
      dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown)
      imul r0.w, vaTid0.x, cb0[0].w
      sample_resource(0)_sampler(0) r1, r0.xy
      add r0.x, r0.x, r0.1
      sample_resource(0)_sampler(0) r2, r0.xy
      add r0.x, r0.x, r0.1
      sample_resource(0)_sampler(0) r3, r0.xy
      add r0.x, r0.x, r0.1
      sample_resource(0)_sampler(0) r4, r0.xy
      add r0.x, r0.x, r0.1
      mov g[r0.w + 0], r1
      mov g[r0.w + 1], r2
      mov g[r0.w + 2], r3
      mov g[r0.w + 3], r4
      end

      compiles to give

      ...

      04 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R1.x], R0, ELEM_SIZE(3)  VPM
      05 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R2.x], R5, ELEM_SIZE(3)  VPM
      06 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R3.x], R6, ELEM_SIZE(3)  VPM
      07 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R7, ELEM_SIZE(3)  VPM

      ...

       

      Investigating further, I played with the SKA (1.7) on vista, set to compile code for a 4870.

       

      The above kernel gives

      03 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R5, ELEM_SIZE(3)   BRSTCNT(3)

       

      for catalysts set to 10.6 and earlier in the options, but

       

      02 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R0, ELEM_SIZE(3)
      03 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R5.x], R1, ELEM_SIZE(3)
      04 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R6.x], R2, ELEM_SIZE(3)
      05 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R7.x], R3, ELEM_SIZE(3)

       

      for more recent catalysts, in particular including the most recent one.

       

      So, I would like to know:

       

      1.  Is this a bug, or is there a reason for this change?

       

      2.  What are the performance implications? 

       

      3.  Or do hardware improvements for the 6900's at least render bursting irrelevant?

       

      4.  Is burst reading now supported in hardware in the 6900s?

       

      5.  If this is a bug, will burst writing be supported by the compiler again shortly?  (So it can be used by the 6900s in particular.)

       

      6. Will/is burst reading be supported by the compiler shortly?

       

      Thanks for any advice,

      Steven.

       

        • burst writing no longer working?
          MicahVillmow
          Global buffer is not the most efficient method of writing to memory on 8XX and 9XX devices. The most efficient method is to use UAV's, which were introduced with 8XX GPU's and a single UAV was back ported to be supported on 7XX devices. I've filed a regression against the compiler team, but can you see if using a UAV fixes your problem.
          • burst writing no longer working?
            MicahVillmow
            sgratton,
            The cause has been found and the fix will be in a future driver release(probably March or April).
              • burst writing no longer working?
                sgratton

                Dear Micah,

                 

                Thanks for taking a look at this.  I've begun to experiment with UAVs and will post questions in a new thread.  Playing with the SKA I did notice that, as you implied, UAV code does still work on r7xx (or even on the 3870 if you use a pixel shader!), and, so is affected by this issue; if you write say 8 consecutive float4's, you get two burst writes with the older catalysts

                01 MEM_EXPORT_WRITE: DWORD_PTR[0], R8, ELEM_SIZE(3)   BRSTCNT(3)
                02 MEM_EXPORT_WRITE: DWORD_PTR[16], R0, ELEM_SIZE(3)   BRSTCNT(3)

                but 8 individual ones with the more recent ones:

                 

                01 MEM_EXPORT_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)
                02 MEM_EXPORT_WRITE: DWORD_PTR[4], R1, ELEM_SIZE(3)
                03 MEM_EXPORT_WRITE: DWORD_PTR[8], R2, ELEM_SIZE(3)
                04 MEM_EXPORT_WRITE: DWORD_PTR[12], R3, ELEM_SIZE(3)
                05 MEM_EXPORT_WRITE: DWORD_PTR[16], R4, ELEM_SIZE(3)
                06 MEM_EXPORT_WRITE: DWORD_PTR[20], R5, ELEM_SIZE(3)
                07 MEM_EXPORT_WRITE: DWORD_PTR[24], R6, ELEM_SIZE(3)
                08 MEM_EXPORT_WRITE: DWORD_PTR[28], R7, ELEM_SIZE(3)

                 

                So UAV code on r7xx should get better with this fix.

                 

                Best,

                Steven.

              • burst writing no longer working?
                MicahVillmow
                sgratton,
                A single UAV should work on the 3870, but is not officially supported, as it is mapped to the global buffer. However, you might find issues because we don't test UAV's on that device.