Hi!
I'm trying to implement a multi gpu app.
i need to share data between two gpus.
Because there is random access in the buffers, i cant just split the calculation into half.
I would like to ask, what could be the best way to synchronize modified data after kernel execution. I cant just copy them together, because there are random modifications on both gpu(but they newer write to the same area).
First i thought that svm is the answer, but it seems that its only shared between one device and the host at a time. If i create an svm buffer and do modifications on the kernel side, it
won't be combined on the host side, there is no option to just map the buffer, i can only map the buffer from queue1 or queue2 so map the gpu1's svmbuffer or the gpu2's. I have to
map the buffer to queue1(gpu1) and queue2(gpu2) every time i want to send data to the gpu-s. Am i doing something wrong, or its not actually as shared as i thought.
Thanks!
Solved! Go to Solution.
I think, fine grained SVM buffer (with atomic support) can be used for the above purpose. For more details about the fine grained SVM, please refer the OpenCL spec.
FYI:
http://developer.amd.com/community/blog/2015/01/15/opencl-2-0-fine-grain-shared-virtual-memory/
http://developer.amd.com/community/blog/2015/09/08/fine-grain-svm-with-examples/
I think, fine grained SVM buffer (with atomic support) can be used for the above purpose. For more details about the fine grained SVM, please refer the OpenCL spec.
FYI:
http://developer.amd.com/community/blog/2015/01/15/opencl-2-0-fine-grain-shared-virtual-memory/
http://developer.amd.com/community/blog/2015/09/08/fine-grain-svm-with-examples/
Fine grain buffer without atomics seems to do the thing. I created an svm fine grained buffer, and can compute randomly in it across multiple dgpu-s.
Okay. Actually, the atomic functions are useful if any synchronization is needed among multiple accesses of same data from different threads.