I am using OpenCL to run basic picture analysis function on GPU.
As I am working with HD pictures (1920*1080) I have a lot of pixels to deal with, and I need wide vectorization. I read that image2d object provide good performances when working with 2d images, which is my case, so I decided to use it instead of buffers. But the thing is that readImage functions into kernel, do not return any vector bigger than 4 components, whereas the hardware should, to my opinion, deal with much more.
So, am I missing something, and there is a possibility to read more than 4 pixels, or I should use buffers and do use vloadn to read my pixels?
After running some basic execution time measures, it appears that using image2d is better if working with GPU, but it's the opposite with CPU. As I need to process a reduction which is about 30% of the total time, it would be perfect if I could read int16 vector with image2d.
Image2D is used to deal with textures (or general 2d data). It's probably the right choice in your case. But the vectors you load and store using readImage pertain to just one pixel (e.g. rgba values etc.). To exploit the parallelism of the GPU (i.e. the fact that there are many cores) you need to write a kernel where each thread acts on an individual pixel (or a small group of pixels). If you launch that kernel with e.g. 1920x1080 work items you will exploit the parallelism of the GPU.