neuralll

where are the new 5870 (Evergreen) instructions documented ?

Discussion created by neuralll on Dec 16, 2009
Latest reply on Dec 16, 2009 by empty_knapsack
CF_POPCNT gds_write sad4

Hi guys. I started working with nvidia cards but bought a ATI HD 5970 card since I did read on beyond3d that they sport interesting new instructions

POPCNT , SAD and mainly because ati have 64k buffer to share data between all SIMD not just workgroup. It's called Global Data Store but now that I had bought the card I can't find anything about this in docs or samples. I plan to load short string just once to global data store and then have all simd units compare to this strng in paralell . And no I don't care that gds don't have working locking mech. since I will just read and for each kernel just once. Idea is to save zillion of loads of the same data from slow mem if all threads compare to the same strings (but chosen during runtime) and also prevent cache pollution. I know that there is constant mem for this but the strings that I compare to are calculated by kernel itself. Or is there way to output to constant mem so if there is 10000 reads but 1 write then caching of constant mem will still be more efficient then global mem ?

Then I do something like histogram and popcnt instruction is very usefull there to spot difference between two histograms fast.

Only place that I found those instructions are strings in compiller aticaldd.dll

CF_POPCNT

gds_atomic_ordered_alloc
gds_ushort_read_ret
gds_short_read_retgds_ubyte_read_ret
gds_byte_read_ret
gds_readwrite_ret
gds_read2_ret
gds_read_rel_ret
gds_read_ret
gds_cmp_xchg_spf_ret
gds_cmp_xchg_ret
gds_xchg2_ret
gds_xchg_rel_ret
gds_xchg_ret
gds_mskor_ret
gds_xor_ret
gds_or_ret
gds_and_ret
gds_max_uint_ret
gds_min_uint_ret
gds_max_int_ret
gds_min_int_ret
gds_dec_ret
gds_inc_ret
gds_rsub_ret
gds_sub_ret
gds_add_ret
gds_short_write
gds_byte_write
gds_cmp_store_spf
gds_cmp_store
gds_write2
gds_write_rel
gds_write
gds_mskor
gds_xor
gds_or
gds_and
gds_max_uint
gds_min_uint
gds_max_int
gds_min_int
gds_dec
gds_inc
gds_rsub
gds_sub
gds_add

GDS OP is for R800 up only\n
gds_atomic_ordered_alloc

sadhi
sad4

sad_accum_prev_uint
sad_accum_uint
sad_accum_hi_uint

IL_OP_SAD
IL_OP_SAD_HI
IL_OP_SAD_4

They seem like Ati Close To Metal thingies. Opposed to this SAD instruction it seems like GDS CTM thingies are not yet mapped to IL language so probably only way to use them now would be to program everything in CTM ? I know that everything will be released in some time but in the same time nvidia guys are quite agressively pushing their tesla 1070 to our marketing.

Outcomes