Archives Discussions

richeek_arya · ‎06-07-2011

Conceptual Question

Hi,

I have question regarding copyign data over to GPU, the way I am doing is is like this:

it is a 4 step process:

1. clCreateBuffer

2. Get a ptr usign clEnqeueuMapBuffer

3. Copy the data to this ptr using memcpy --- input data is supplied from the user

4. Unmap the data

In the end free the memory using this:

status = clReleaseMemObject(s_re_d);

Is it correct that I do not need to free s_re_p since I never allocated memory to it?? Could someone please let me know if all these steps are following OpenCL guidelines?

with regadrs,

Richeek

cl_mem s_re_d; size = nT * nSym * sizeof(float); total_size += size; s_re_d = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, &status); if(status != CL_SUCCESS) { mexPrintf("Error: Setting kernel argument. \n"); return; } float *s_re_p; s_re_p = (float *)clEnqueueMapBuffer(commandQueue,s_re_d,CL_FALSE,CL_MAP_WRITE,0,size,0,NULL,NULL,&status); if(status != CL_SUCCESS) { mexPrintf("Error: clEnqueueMapBuffer \n"); return; } memcpy(s_re_p, s_re, size); /* Load the data back on the GPU */ status = clEnqueueUnmapMemObject(commandQueue,s_re_d,(void*)s_re_p,0,NULL,&ev); if(status != CL_SUCCESS) { mexPrintf("clEnqueueUnmapMemObject() failed\n"); return; } status = clWaitForEvents(1, &ev); if(status != CL_SUCCESS) { mexPrintf("clEnqueueUnmapMemObject() Release failed s_re_d\n"); return; }

himanshu_gautam · ‎06-07-2011

AFAIK the method is correct for read_write buffers.

You don't need to call any free(s_re_p) . It will be released when unmap is called.

richeek_arya · ‎06-07-2011

Thanks Himanshu for checking it.

On the system which has NVIDIA GTX 260 with Windows 7, 64 bits I had to do this too:

s_re_p = NULL;

Else CPU physical memory usage is shooting up as I run the program in a loop.

However, as I mentioned earlier with ATI Radeon 5450 and Windows 7, 64 bits I did not make s_re_p = NULL; wheh I return and memory usage is still bounded.

I can not explain this behavior since host side memory should be allocated by the Operating System which in this case is the same Windows 7. Or am I wrong and in the case of OpenCL program this is done by the SDK.

If this is SDK dependent then this may explain it since NVIDIA and AMD has differnet SDKs.

But anyways making the pointer NULL when you leave is a good practice I suppose. So I will stick to it.

Thanks,

Richeek

richeek_arya · ‎06-08-2011

Originally posted by: richeek.arya

On the system which has NVIDIA GTX 260 with Windows 7, 64 bits I had to do this too:

s_re_p = NULL;

Else CPU physical memory usage is shooting up as I run the program in a loop.

Actually I take the above statement back. Memory usage again got increased after some iterations and when I rebooted the system program started failing at clCreateContext() with error code -6 (CL_OUT_OF_HOST_MEMORY). It is only on NVIDIA's hardware. Then I changed my code to CUDA and everything start workign perfectly fine.

himanshu_gautam · ‎06-08-2011

Hi richeek.arya,

This appears to be a bug in NVIDIA SDK. It would be better to report there.

Anyways thanks for sharing your experience.

Archives Discussions

Copying Data over to GPU