Hi guys. I started working with nvidia cards but bought a ATI HD 5970 card since I did read on beyond3d that they sport interesting new instructions
POPCNT , SAD and mainly because ati have 64k buffer to share data between all SIMD not just workgroup. It's called Global Data Store but now that I had bought the card I can't find anything about this in docs or samples. I plan to load short string just once to global data store and then have all simd units compare to this strng in paralell . And no I don't care that gds don't have working locking mech. since I will just read and for each kernel just once. Idea is to save zillion of loads of the same data from slow mem if all threads compare to the same strings (but chosen during runtime) and also prevent cache pollution. I know that there is constant mem for this but the strings that I compare to are calculated by kernel itself. Or is there way to output to constant mem so if there is 10000 reads but 1 write then caching of constant mem will still be more efficient then global mem ?
Then I do something like histogram and popcnt instruction is very usefull there to spot difference between two histograms fast.
Only place that I found those instructions are strings in compiller aticaldd.dll
GDS OP is for R800 up only\n
They seem like Ati Close To Metal thingies. Opposed to this SAD instruction it seems like GDS CTM thingies are not yet mapped to IL language so probably only way to use them now would be to program everything in CTM ? I know that everything will be released in some time but in the same time nvidia guys are quite agressively pushing their tesla 1070 to our marketing.