Can anyone comment on their experience using buffer/local memory as opposed to images ?
It seems that the standard way of doing image convolution is to:
1) read from system memory into global buffer
2) break up the global buffer into 2D blocks,
3) do a coalesced read of each block into local buffer
4) perform calculations
5) write back to global buffer.
With opencl images, there is no need to use local memory: simply read from image A,
perform calculations, then write to image B. (image A cannot be read /write at the moment).
Care must be taken to avoid bank conflicts in the local memory.
On AMD hardware, which is faster? Certainly, images make things simpler: no need to worry
about boundaries or local memory buffer. But, am I deluding myself in thinking that images
are better than global/local scheme, for image convolutions?