Archives Discussions

boxerab · ‎08-02-2014

Can anyone comment on their experience using buffer/local memory as opposed to images ?

It seems that the standard way of doing image convolution is to:

1) read from system memory into global buffer

2) break up the global buffer into 2D blocks,

3) do a coalesced read of each block into local buffer

4) perform calculations

5) write back to global buffer.

With opencl images, there is no need to use local memory: simply read from image A,

perform calculations, then write to image B. (image A cannot be read /write at the moment).

Care must be taken to avoid bank conflicts in the local memory.

On AMD hardware, which is faster? Certainly, images make things simpler: no need to worry

about boundaries or local memory buffer. But, am I deluding myself in thinking that images

are better than global/local scheme, for image convolutions?

Thanks!

Aaron

dipak · ‎08-11-2014

Hi Aaron,

In terms of bandwidth, here is the relation (in ideal scenario):

local memory < texture memory < global memory.

Efficient use of local memory can improve performance a lot. However, it is lot harder to program as filling up the local memory and synchronizing access need to be done by the programmer. Few of the most challenges to handle the local memory are data accessing pattern, limited size and synchronization. Otherwise, these will have negative impact on the performance. But you can access the same memory for reading and writing. On the other hand, handling the images is easier and no need to worry about those points. Use of images in read only mode is the fastest as it uses the on-chip texture cache. But as you've pointed rightly, same image can not be read and write in a same kernel. Another important disadvantage is, as most of the OpenCL implementations prefer to store(internally) the image object as non-linear fashion, the frequent copy and mapping of image objects to/from linear memory such as buffer or linear host memory can degrad the overall performance a lot. So, both have some positive and negative points and the choice depends on the application and its implementation.

You can check following links to have an better idea:

http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-c...

ABM Musa

Regards,

dipak · ‎08-12-2014

In terms of bandwidth, here is the relation (in ideal scenario):

local memory < texture memory < global memory.

Sorry, my mistake. The relation is:

local memory > texture memory > global memory

Regards,

Archives Discussions

Buffer vs. Image performance