When image is passed from host, the ImageFormat can be specified with format such as CL_UNSIGNED_INT16, CL_HALF_FLOAT, etc. If it is read into int4 or float4, I suppose it is effectively reading 64 bits at a time and upscaled - it seems there is no way to read into short4 or half4. My question is, is this an effective way of reading?
I can not find any document for how image is cached, can you explain the z-order mean here? Why do you think normalized image format would provide better accuracy? The input values are already in the range so why normalize them?
For your last advice, are you talking about using buffers instead of image? Since we are accessing full HD image, a 2D image access would be very convenient and more cache efficient, I guess. Is 'CacheHit' a good performance counter to see if we read the image effectively? Does the hit ratio include L2 cache hit too? Can you explain 'MemUnitBusy' and 'MemUnitStalled'?