AnsweredAssumed Answered

Fastest device to host transfer

Question asked by ajaj14 on Jul 18, 2013
Latest reply on Jul 23, 2013 by himanshu.gautam

My question is this: how to achieve the fastest device to host transfer speed. The short answer is pinned memory, however my problem is a bit more complex.


I have a piece of device memory which I have to transfer to a varying address of host memory. So the host memory cannot be prepinned. I use this code:



void clMemcpyDeviceToHost(void * dst,cl_mem src,int size)


    cl_mem cl_output = clCreateBuffer(m_context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, size, dst, NULL);

    void* p_map_output = clEnqueueMapBuffer(m_commandQueue, cl_output, CL_TRUE, CL_MAP_WRITE_INVALIDATE_REGION , 0, size, 0, NULL, NULL, NULL);

    clEnqueueReadBuffer(m_commandQueue,src, CL_TRUE,0,size,p_map_output,0,NULL,NULL);

    clEnqueueUnmapMemObject(m_commandQueue, cl_output, p_map_output, 0, NULL, NULL);





It seems that the slowest part is the clEnqueueMapBuffer, so my guess is that it actually copies something which I would not want it to do. I tried to set the block flag to CL_FALSE and put the first two lines before a good amount of computation code so that it could do the mapping while I do something else, but the call still blocks for a good amount of time (twice as long then the copy afterwards).


Am I doing it wrong? Is there a faster way?