cl_mem buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, &status);
no host pointer
Example of my buffer allocation:
if(err != CL_SUCCESS)fprintf(stderr,"Error: clCreateBuffer (gpu_fold_large_neg): %d\n",err);
Looks very same, but it doesn't help
On linux this works for me. I allocate a buffer of 128 MB on the GPU and the process
size on the host is only about 40MB total. Are you on linux or windows?
I'm on Windows
And I have 2 app parts that use different size memory blocks (big ones both) so I should allocate/deallocate 2 sets of buffers in loop.
I have not checked this on linux but on windows OpenCL always allocates one temporary buffer. Even if you use CL_DEVICE_TYPE_CPU and specify CL_USE_MEM_HOST_PTR in buffer creation, it will still allocate a temp buffer on host.
This creates problems in porting existing apps to OpenCL which require large buffers.
That is, no way ?
IMO it's big flaw in implementation. Some option to prevent such duplications should exist.
BTW, now I compare app performance between HD4870 and GSO9600 and very disappointed in ATI GPU performance. App was written for ATI, float4 memory accesses and operations used everywhere where possible, but GSO9600 performing better (and sometimes much better) in most cases. For now HD4870 looks better only on one kind of workload where most CPU time involved (so, it maybe CPU differencies, not GPU).