I have decided to use Images/Textures instead of global memory to hold my read only, database like, tables (about 20 mb). This a commercial application that does not have control over what graphics card the customer chooses. I could do about 50 billion texel lookups in the course of 7600 (512 x 200) calls to one of my kernels, so I/O is very important to me.
Using the very wide memory bus effectively is not likely to be achievable across GPU vendors using global memory, due to how specific you might need to be in terms of your data layout - kernel design - work group sizing. Penalties might be quite severe, if things are not absolutely perfect.
Since Samplers get 4 values a time (RGBA), you are guaranteed to get at least 4X performance. This brings me to the bonus part, the texture cache. I have a couple of questions about AMD's texture caching for OpenCL. Feel free to just describe how it works for OpenGL, and add disclaimers. I just want hints.
- What is the ballpark size of the texture cache for OpenStreams GPU's? 16 kb?
- When a cache miss is encountered at let's say (10, 0) with a RGBA float image, size(7000, 1), what are the address that will be in the cache afterward?