How well do algorithms that use look-up tables perform when you implement the table as an image for the parallel version? For instance, how well would the "Two Plus Two" Poker evaluator be expected to perform on a GPU if the 133MB look-up table were encoded as an image and read using samplers?
http://www.codingthewheel.com/archives/poker-hand-evaluator-roundup#2p2
Do image reads need to be coalesced like buffer reads for there to be maximal performance?
--Keith Brafford
Solved! Go to Solution.
See the section in the programming guide about memory tiling.
In short: for maximum performance image access should be 2d-coherent, and the cache is so small it has to be pretty closely coherent.
For random access pattern a simple array might be better, unless the 8-bit 'float' access is useful.
See the section in the programming guide about memory tiling.
In short: for maximum performance image access should be 2d-coherent, and the cache is so small it has to be pretty closely coherent.
For random access pattern a simple array might be better, unless the 8-bit 'float' access is useful.