cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Biaowang
Adept II

Zero copy also need memory transfer on APU?

Hi, Guys:

I have APU A6-4455M and try to employ "zero copy" to optimize the memory transfer. However, after I read the programming guide and read the code of  the BufferBandwidth, I found actually you need a copy anyway to tranfer data from CPU to GPU by following sequences:      

  1. hostptr = clEnqueueMapBuff(cl_mem_obj)
  2. memcpy(hostptr,  ptr_with_initialized_data_on_host, size_of_data)
  3. clEnqueueUnMapMemobj(cl_mem_obj, hostptr)

If the understanding of zero copy above were correct, the question is, why it is called "zero copy" if it need a copy anyway from CPU to GPU or vice verse? Zero copy, IMHO, should have no copy at all since now the CPU and GPU are at the same die without the PCIe connection for APU.

Any explanation is appreciated!

0 Likes
7 Replies
gbilotta
Adept III

What you are doing is not zero-copy. You should allocate the buffer with a USE_HOST_PTR flag and pass it the ptr_with_initialized_data at buffer creation time. To make it work, the host data should have been allocated with the correct alignment too (see the SDK manual for details).

hi, gbilotta:

thank you for the reply. However, this is not correct according to the table 4.2 in http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf

using USE_HOST_PTR the zero copy can only be supported for the device of CPU. I assume the flag should be the ALLOC_HOST_PTR

0 Likes

yes you should use CL_MEM_USE_PERSISTENT_MEM_AMD for read only buffers and ALLOC_HOST_PTR for write only buffers.

0 Likes
nou
Exemplar

clEnqueueMapBuffer() is needed cases when is not possible to use zero copy. in case that it can map without copy it doesn't do anything. for memcpy it for demonstration that you can access memory.

0 Likes
himanshu_gautam
Grandmaster

Checkout section 4.5 of OpenCL Programming Guide to properly understand zero copy.

0 Likes

hey, himanshu:

I already read the section 4.5 and still confusing, that is why I post the threads here.

gbilotta provide me a hints to do zero copy properly, using USE_HOST_PTR, when creating  the memory buffer. however, according to the Table 4.2 in the http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf

the zero copy supported by the flag USE_HOST_PTR is only when the device is CPU, while my device is GPU part of the APU.


I even believe gbilotta's answer is correct, while the table is wrong.


0 Likes

Can you prove the table is wrong? Please share some code for that

And beware zero copy may not always be the answer to your problem. And it can in some cases reduce your performance.

"If the understanding of zero copy above were correct, the question is, why it is called "zero copy" if it need a copy anyway from CPU to GPU or vice verse? Zero copy, IMHO, should have no copy at all since now the CPU and GPU are at the same die without the PCIe connection for APU.

Any explanation is appreciated!"

Check http://amddevcentral.com/afds/assets/presentations/1004_final.pdf

for better understanding of zero copy in context with APUs.

0 Likes