I port part of my application (a optimized HEVC decoder) from CPU to on GPU and found the memory copy is a bottleneck, on every GPU from AMD, NVIDIA, Intel.
So I would like to apply the zero copy optimization on Kabini.
First question is that Have the busses (FCL, or "Onion", and Radeon memory bus, or "Garlic") in the Unified North Bridge (UNB) changed from Llano to Kabini? I know now Kavari support OpenCL 2.0 but currently I only have Kabini on hand.
Second question, is it possible to achieve zero copy using "USE_HOST_PTR" flag? Because I have legacy memory buffer created by application from malloc(), so I want to use this flag to create a GPU buffer object. However, according to programming guide (rev 2.7, November 2013, 126.96.36.199 Pre-pinned Buffers). "As long as they (buffer of type CL_MEM_USE_HOST_PTR) are used only for data transfer, but not as kernel arguments. If the buffer is used in a kernel, the runtime creates a cached copy on the device, and subsequent copies are not on the fast path". So there will be a copy still.
In my opinion this is pointless OpenCL implementation, if we don't touch this buffer with GPU, then why we create this buffer for GPU kernel execution?