OpenCL specification give choise of 3 methods of createing memory buffer. I wonder which of them will work faster.
1. call clCreateBuffer(), then clEnqueueWriteBuffer(), then run kernel
2. call clCreateBuffer() with CL_MEM_USE_HOST_PTR flag
3. call clCreateBuffer(), then Map buffer, then manually write data to it, then unmap and run kernel
What alignment required from host memory pointer in methods 1 and 2? I suppose it will significantly influence data transfer speed