cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

USWC and discrete GPU

why only APUs mentioned?

USWC - Host memory from the Uncached Speculative Write Combine heap can be accessed by the GPU without causing CPU cache coherency traffic. Due to the uncached WC access path, CPU streamed writes are fast, while CPU reads are very slow. On Fusion devices, this memory provides the fastest possible route for CPU writes followed by GPU reads.


And what will be fastest for Discrete GPU, provided CPU can perform streamed writes for data buffer?

0 Likes
1 Reply
genaganna
Journeyman III

Originally posted by: Raistmer
USWC - Host memory from the Uncached Speculative Write Combine heap can be accessed by the GPU without causing CPU cache coherency traffic. Due to the uncached WC access path, CPU streamed writes are fast, while CPU reads are very slow. On Fusion devices, this memory provides the fastest possible route for CPU writes followed by GPU reads.
And what will be fastest for Discrete GPU, provided CPU can perform streamed writes for data buffer?


Best paths for discrete GPUs

1. Read only Input buffers of the kernel should be created with CL_MEM_USE_PERSISTENT_MEM_AMD

2. Write only output buffers of the kernel should be created with CL_MEM_ALLOC_HOST_PTR

Kernel execution time is increated if output buffer is CL_MEM_ALLOC_HOST_PTR.

0 Likes