I have an HD 7700 GPU, on windows 7.
Here is how I am currently managing my memory:
1) I allocate a host buffer using new operator, aligned to page size (4096 bytes)
2) I allocate a device buffer via clCreateBuffer, using the CL_MEM_READ_ONLY flag
3) I transfer from host to device using clEnqueueWriteBuffer(...), with blocking set to CL_FALSE.
I have found this method to be pretty fast. Can I do better?