cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ajaj14
Journeyman III

Fastest device to host transfer

My question is this: how to achieve the fastest device to host transfer speed. The short answer is pinned memory, however my problem is a bit more complex.

I have a piece of device memory which I have to transfer to a varying address of host memory. So the host memory cannot be prepinned. I use this code:

void clMemcpyDeviceToHost(void * dst,cl_mem src,int size)

{

    cl_mem cl_output = clCreateBuffer(m_context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, size, dst, NULL);

    void* p_map_output = clEnqueueMapBuffer(m_commandQueue, cl_output, CL_TRUE, CL_MAP_WRITE_INVALIDATE_REGION , 0, size, 0, NULL, NULL, NULL);

    clEnqueueReadBuffer(m_commandQueue,src, CL_TRUE,0,size,p_map_output,0,NULL,NULL);

    clEnqueueUnmapMemObject(m_commandQueue, cl_output, p_map_output, 0, NULL, NULL);

    clReleaseMemObject(cl_output);

}

It seems that the slowest part is the clEnqueueMapBuffer, so my guess is that it actually copies something which I would not want it to do. I tried to set the block flag to CL_FALSE and put the first two lines before a good amount of computation code so that it could do the mapping while I do something else, but the call still blocks for a good amount of time (twice as long then the copy afterwards).

Am I doing it wrong? Is there a faster way?

Thanks

0 Likes
3 Replies
himanshu_gautam
Grandmaster

How do you find the slowest part? What size of memory are you reading? Are you running it in iterations?

The API does not seem to be using any cl_events. (EDITED)

Your code looks reasonable to give good read performance.

0 Likes

I measured the speed with using clFinish() calls before and after the opencl codes.

The memory size I was testing is 32 MB.

Yes, I am runnin iterations, 1-2000.

I tried the blocking with events, but it did not help.

For me the clEnqueueMapBuffer seem to be blocking for 5.5 ms even if it should not block. Is there any way for it to not block, or not copy? Do you have a sample code may be where it is not blocking?

Thanks

0 Likes


Do you have a sample code may be where it is not blocking?



Thanks



You should check AMD APP SDK Samples for that.

You can share your code here too (attach as a zip file), and other developers and point out bugs in it.

Also mention details about your setup: CPU, GPU, Driver, SDK, OS.

0 Likes