Welcome to this forum.
Actually, when CL_MEM_ALLOC_HOST_PTR flag is used, the buffer is created in pinned host memory and it's a zero copy buffer. So, it has following properties compared to normal device buffer (i.e. default or created with 0 flag)
- mapping to host or clEnqueueMapBuffer is much faster.
- directly accessible from the GPU device(s) but in slower speed (for dGPUs, the speed is much slower and limited by PCIe bus speed)
- Same memory location is used for each map, so, the mapping pointer will be same.
For more information, please refer the section "OpenCL Memory Objects" in AMD OpenCL Programming Optimization guide.
I've attached a small program to demonstrate the above points. [Note: memory release and other unrelated things have been ignored]
In the source file, please change MEM_FLAG and NUM_ELEMENTS macros to see the effects. Hope it will be helpful to you.
HelloWorld.cpp.zip 1.2 KB
It helped a lot.