I can answer you only about the second part.
In ocl 1.2, you don't have shared common buffers. You need to explicitly call clEnqueueCopyBuffer between these 2 devices. There will be transfer across the PCI bus, but not across the host. clEnqueueCopyBuffer works only with devices in the same context.
Thank you for the answer.
Unfortunately that means, it cannot be done via OpenCL 1.2, because its not possible to share random data between devices(fast).
AMD SVN article says the following:
"OpenCL 2.0 removes this limitation: the host and OpenCL devices can share the same virtual address range, so you no longer need to copy buffers between devices. In other words, no keeping track of buffers and explicitly copying them across devices! Just use shared pointers."
But i fear they only mean one cpu and one gpu, then its misleading a little, unfortunately i cant test it myself.
Actually, you can get most of your answers from OpenCL spec. only. Each vendor may implement it different ways but all of them follow the spec., though there may be some limitations at certain point.
As per clSVMAlloc :
"Allocates a shared virtual memory (SVM) buffer that can be shared by the host and all devices in an OpenCL context that support shared virtual memory. "
It fails to allocates memory when
sizeparameter is 0 or greater than
CL_DEVICE_MAX_MEM_ALLOC_SIZEvalue for any device in
Before using the SVM to share data among multiple devices, please read the section "5.6.1 SVM sharing granularity: coarse- and fine- grained sharing" from OpenCL 2.0 spec. It is very important to understand the granularity (or level of memory consistency) and synchronization points (or visibility of the updates) of the SVM objects. Hope you'll get a clear idea from that section.
If you've any further doubt/question, please share with us. We'll try our best to answer them.
Note: You may also refer SVM related FAQs from here http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_APP_SDK_FAQ2.pdf
> Unfortunately that means, it cannot be done via OpenCL 1.2, because its not possible to share random data between devices(fast).
That's not necessarily true. clEnqueueCopyBuffer is fast, and the same transfer occurs over PCI for both devices. So, in the svm case, you get slower writes to the shared memory and you have to worry about synchronization. Overall time should be roughly the same.
However, if you want a common buffer, svm is the way to go.