1 Reply Latest reply on Sep 16, 2008 3:25 PM by udeepta@amd

    peak texture sampling performance

    sgratton
      ...for the different data types


      Hi there,

      I'm trying to understand the performance of some code and I'm wondering it it's limited by the texture sampling rate. While I have found some "graphics" information on the web, that is normally expressed in of "numbers of n-bit pixels per second" or something and I'm not quite sure how this converts to what is available in CAL/IL. So, on an HD3870, could someone tell me how many samples per second (or cycle) I should expect (assuming everything is in L1 cache) using a sequence of instructions like

      sample_resource(0)_sampler(0) r0, r10.zy

      assuming

      dcl_input_position_interp(linear_noperspective) vWinCoord0.xy

      dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)

      (and r10 is related to vWinCoord0)
      ?

      i.e. what is the peak bit rate for reading in float4's in cal/il?

      And does this change under certain circumstances? e.g. without the unnorm declaration, or with an _aoffimmi addition? How about if I read from a float or float2 instead? (And how about for one of the newer cards?)

      Then, should I gather sample_resource's in groups of 8 to maximize performance? And spread such groups throughout alu instructions or group them up (I seem to be finding the latter)? And if anyone could advise if there is an optimal order to the _aoffimmi indices to read in a small array (e.g. 8x1 or 4x4) of float4's, or if it makes no difference, I'd be very grateful.

      Thanks a lot,
      Steven.