I am using OpenCL to run basic picture analysis function on GPU.
As I am working with HD pictures (1920*1080) I have a lot of pixels to deal with, and I need wide vectorization. I read that image2d object provide good performances when working with 2d images, which is my case, so I decided to use it instead of buffers. But the thing is that readImage functions into kernel, do not return any vector bigger than 4 components, whereas the hardware should, to my opinion, deal with much more.
So, am I missing something, and there is a possibility to read more than 4 pixels, or I should use buffers and do use vloadn to read my pixels?
After running some basic execution time measures, it appears that using image2d is better if working with GPU, but it's the opposite with CPU. As I need to process a reduction which is about 30% of the total time, it would be perfect if I could read int16 vector with image2d.