So I read the section 4.5 in the APP programming guide and I am confused by the difference between options 3 and 5. Now keep that in mind that I am on Linux so I have no zero-copy support at this time (SDK 2.4).
Option 3 suggests using two buffers, one created with CL_MEM_READ_WRITE and one with CL_MEM_ALLOC_HOST_PTR. The application interacts with the latter buffer, and then uses clEnqueueCopyBuffer to copy between two buffers. The former buffer is passed to the kernel. The guide suggests that this results in the maximum transfer times.
Option 5 suggests using only the CL_MEM_ALLOC_HOST_PTR. The guide suggests that this results in transfers across the PCIe during the kernel execution.
So my question is... should I be using option 3 in my code right now (without any zero-copy support) instead of option 5? Most of my buffers are initialized at the start of my code and are never touched again.
CL_MEM_ALLOC_HOST_PTR put buffer into host memory. which is slow. and as you dont need update your buffers just create buffer without CL_MEM_(ALLOC|USE)_HOST_PTR. and use just clEnqueue(Map|Read|Write)
Well, I just tested it and I found no speed difference between any of those options. I'll see if this changes when zero-copy support is added to Linux.