Hi, Guys:
I have APU A6-4455M and try to employ "zero copy" to optimize the memory transfer. However, after I read the programming guide and read the code of the BufferBandwidth, I found actually you need a copy anyway to tranfer data from CPU to GPU by following sequences:
If the understanding of zero copy above were correct, the question is, why it is called "zero copy" if it need a copy anyway from CPU to GPU or vice verse? Zero copy, IMHO, should have no copy at all since now the CPU and GPU are at the same die without the PCIe connection for APU.
Any explanation is appreciated!
What you are doing is not zero-copy. You should allocate the buffer with a USE_HOST_PTR flag and pass it the ptr_with_initialized_data at buffer creation time. To make it work, the host data should have been allocated with the correct alignment too (see the SDK manual for details).
hi, gbilotta:
thank you for the reply. However, this is not correct according to the table 4.2 in http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
using USE_HOST_PTR the zero copy can only be supported for the device of CPU. I assume the flag should be the ALLOC_HOST_PTR
yes you should use CL_MEM_USE_PERSISTENT_MEM_AMD for read only buffers and ALLOC_HOST_PTR for write only buffers.
clEnqueueMapBuffer() is needed cases when is not possible to use zero copy. in case that it can map without copy it doesn't do anything. for memcpy it for demonstration that you can access memory.
Checkout section 4.5 of OpenCL Programming Guide to properly understand zero copy.
hey, himanshu:
I already read the section 4.5 and still confusing, that is why I post the threads here.
gbilotta provide me a hints to do zero copy properly, using USE_HOST_PTR, when creating the memory buffer. however, according to the Table 4.2 in the http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
the zero copy supported by the flag USE_HOST_PTR is only when the device is CPU, while my device is GPU part of the APU.
I even believe gbilotta's answer is correct, while the table is wrong.
Can you prove the table is wrong? Please share some code for that
And beware zero copy may not always be the answer to your problem. And it can in some cases reduce your performance.
"If the understanding of zero copy above were correct, the question is, why it is called "zero copy" if it need a copy anyway from CPU to GPU or vice verse? Zero copy, IMHO, should have no copy at all since now the CPU and GPU are at the same die without the PCIe connection for APU.
Any explanation is appreciated!"
Check http://amddevcentral.com/afds/assets/presentations/1004_final.pdf
for better understanding of zero copy in context with APUs.