I'm currently playing with some ideas with APU (Carrizo) machine. I'm using OpenCL 2.0 and fine-grained SVM buffer.
According to synchronization for fine-grained SVM, I read that values are synced at synchronization points like clFinish(), clWaitForEvents() or with using atomic operations. However, since in APU, CPU and GPU share memory space (zero-copy), I wanted to see whether updating SVM variable in CPU side will be visible to GPU side without any synchronization points, and vice versa. So I allocated a variable as fine-grained SVM (without atomics flag) and set it as GPU kernel's parameter. GPU kernel will run persistently inside a while loop and check whether the value changes (0 -> 1). Once it sees the change, the kernel will then try to change the value again (1 -> 2). However, after the test, I found that the change of value is never visible. The change was only seen if I used an atomic operation with atomic variable, or the kernel finished instead of persistently running.
This result is little hard to understand for me because CPU and GPU are supposed to share memory in APU. Can anyone explain why we need a synchronization point for SVM in APU machine? If GPU is seeing a different value from same variable that CPU updated, where is GPU getting that value? Also, is there a way to make value update visible without requiring either the GPU kernel to complete (I want the kernel to keep running within a while loop) or use atomic variables?
Sorry for this delayed reply.
Please refer to Section 5.6.1 OCL 2.0 language specification( SVM sharing granularity):
Fine-grained sharing: Shared virtual memory where memory consistency is maintained at a granularity smaller than a buffer. How fine-grained SVM is used depends on whether the device supports SVM atomic operations.
o If SVM atomic operations are supported, they provide memory consistency for loads and stores by the host and kernels executing on devices supporting SVM. This means that the host and devices can concurrently read and update the same memory. The consistency provided by SVM atomics is in addition to the consistency provided at synchronization points. There is no need for explicit calls to clEnqueueSVMMap and clEnqueueSVMUnmap or clEnqueueMapBuffer and clEnqueueUnmapMemObject on a cl_mem buffer object created using the SVM pointer.
And 6.13.11 OpenCL C language specification (Atomic Functions):
In particular, when a host thread needs fine control over the consistency of memory that is shared with one or more OpenCL devices, it must use atomic and fence operations that are compatible with the C11 atomic operations
|This flag is valid only if |
In summary, it’s equivalent to the C11 model, so you have to do acquire/release to make memory operations visible. On Carrizo, SVM atomics are supported, so the spec requires to use it for fine-grain consistency which is exactly how it should behave as reported.