I was wondering what happens in Fusion APU devices (Liano in particular) when an OpenCL context contains both the GPU and the CPU as target devices and a buffer is declared. Specifically, consider the following scenario:
- Kernel A executes on the CPU and writes to the buffer
- Kernel B executes on the GPU and reads from the buffer
- Kernel A is a dependency of Kernel B
If a discrete card were used, the CPU and GPU would each have the buffer declared in their own respective memories and a copy would take place from the CPU's memory to the GPU's memory after Kernel A completes and before Kernel B begins. Is this also true using a Fusion APU if the buffer is declared using the CL_MEM_ALLOC_HOST_PTR on Windows 7, or will there be zero copying in this case?
The AMD APP OpenCL Programming Guide describes zero copy in terms of mapping/unmapping to/from the host. Do I therefore need to use the CPU device's command queue to map the buffer to the host after Kernel A completes, then use the GPU device's command queue to unmap the buffer from the host before calling Kernel B, or will zero copy (and the appropriate flushing of the buffer from the CPU's cache) take place without doing this step?