sgratton

burst writing no longer working?

Discussion created by sgratton on Jan 10, 2011
Latest reply on Jan 17, 2011 by MicahVillmow
Seems to be a problem from cat 10.7 onwards

 

Hi there,

 

After trying out AMD Stream some time ago, with the release of the 6900 cards I thought I'd give it another go.   One issue in getting good memory performance with CAL then was the absence of burst reading (see link here).  Having bought a new card and installed the latest SDK (2.3) and drivers (10.12), I was surprised to see that not even burst writing seems to occur now in both linux and vista (64 bit).  For example if one runs the export_burst_perf sample and prints out the il (export_burst_perf -p) and then the isa (export_burst_perf -a), it appears that the il is written to give burst writes but that the isa doesn't do this.  For example...

 

il_cs_2_0
dcl_cb cb0[1]
dcl_num_thread_per_group 64
itof r0.z, vaTid0.x
div r0.y, r0.z, cb0[0].x
mod r0.x, r0.z, cb0[0].x
flr r0, r0
mul r0.x, r0.x, cb0[0].z
dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown)
imul r0.w, vaTid0.x, cb0[0].w
sample_resource(0)_sampler(0) r1, r0.xy
add r0.x, r0.x, r0.1
sample_resource(0)_sampler(0) r2, r0.xy
add r0.x, r0.x, r0.1
sample_resource(0)_sampler(0) r3, r0.xy
add r0.x, r0.x, r0.1
sample_resource(0)_sampler(0) r4, r0.xy
add r0.x, r0.x, r0.1
mov g[r0.w + 0], r1
mov g[r0.w + 1], r2
mov g[r0.w + 2], r3
mov g[r0.w + 3], r4
end

compiles to give

...

04 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R1.x], R0, ELEM_SIZE(3)  VPM
05 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R2.x], R5, ELEM_SIZE(3)  VPM
06 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R3.x], R6, ELEM_SIZE(3)  VPM
07 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R7, ELEM_SIZE(3)  VPM

...

 

Investigating further, I played with the SKA (1.7) on vista, set to compile code for a 4870.

 

The above kernel gives

03 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R5, ELEM_SIZE(3)   BRSTCNT(3)

 

for catalysts set to 10.6 and earlier in the options, but

 

02 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R4.x], R0, ELEM_SIZE(3)
03 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R5.x], R1, ELEM_SIZE(3)
04 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R6.x], R2, ELEM_SIZE(3)
05 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R7.x], R3, ELEM_SIZE(3)

 

for more recent catalysts, in particular including the most recent one.

 

So, I would like to know:

 

1.  Is this a bug, or is there a reason for this change?

 

2.  What are the performance implications? 

 

3.  Or do hardware improvements for the 6900's at least render bursting irrelevant?

 

4.  Is burst reading now supported in hardware in the 6900s?

 

5.  If this is a bug, will burst writing be supported by the compiler again shortly?  (So it can be used by the 6900s in particular.)

 

6. Will/is burst reading be supported by the compiler shortly?

 

Thanks for any advice,

Steven.

 

Outcomes