peak texture sampling performance

Discussion created by sgratton on Sep 4, 2008
Latest reply on Sep 16, 2008 by udeepta@amd
...for the different data types

Hi there,

I'm trying to understand the performance of some code and I'm wondering it it's limited by the texture sampling rate. While I have found some "graphics" information on the web, that is normally expressed in of "numbers of n-bit pixels per second" or something and I'm not quite sure how this converts to what is available in CAL/IL. So, on an HD3870, could someone tell me how many samples per second (or cycle) I should expect (assuming everything is in L1 cache) using a sequence of instructions like

sample_resource(0)_sampler(0) r0, r10.zy


dcl_input_position_interp(linear_noperspective) vWinCoord0.xy


(and r10 is related to vWinCoord0)

i.e. what is the peak bit rate for reading in float4's in cal/il?

And does this change under certain circumstances? e.g. without the unnorm declaration, or with an _aoffimmi addition? How about if I read from a float or float2 instead? (And how about for one of the newer cards?)

Then, should I gather sample_resource's in groups of 8 to maximize performance? And spread such groups throughout alu instructions or group them up (I seem to be finding the latter)? And if anyone could advise if there is an optimal order to the _aoffimmi indices to read in a small array (e.g. 8x1 or 4x4) of float4's, or if it makes no difference, I'd be very grateful.

Thanks a lot,