Originally posted by: Raistmer
USWC - Host memory from the Uncached Speculative Write Combine heap can be accessed by the GPU without causing CPU cache coherency traffic. Due to the uncached WC access path, CPU streamed writes are fast, while CPU reads are very slow. On Fusion devices, this memory provides the fastest possible route for CPU writes followed by GPU reads.And what will be fastest for Discrete GPU, provided CPU can perform streamed writes for data buffer?
Best paths for discrete GPUs
1. Read only Input buffers of the kernel should be created with CL_MEM_USE_PERSISTENT_MEM_AMD
2. Write only output buffers of the kernel should be created with CL_MEM_ALLOC_HOST_PTR
Kernel execution time is increated if output buffer is CL_MEM_ALLOC_HOST_PTR.