In future GPUs ( read 5xxx series ), size of L1 texture cache is 8KB per SIMD wheres L2 cache is 256KB per memory controller ( 4 x 64bit memory controllers )
See the presentation slide titled Stream Computing.
Thanks Ms. Headshots,
I saw that there is also an on-chip, 64kb Global Data Share. Future ATI GPU's should have little, if any, issue with the methods of other vendor's to get high bandwidth with Global Memory, while at the same time not imposing any of their own.
I am still going to use my sham, single row (possibility wrapped), image technique, because it will be pretty easy to just switch them to global memory some day. Designing an application around a specific vendors OpenCL implementation is a liability. The other reason to use images for now is due to how OpenCL might be implemented on pre-existing hardware.