cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

zoli0726
Journeyman III

svm between multiple gpu

Hello!

  We are considering to buy more AMD Radeon cards and i have a question about shared memory, but couldn't found any article about OpenCL 2.0 shared virtual memory, and multiple GPU combination.   

  What happens if I create a svm buffer and i want it to use across multiple GPU-s?

  Will be this buffer accessible between multiple GPU-s simultaneously? What will determine the maximum size of the buffer? Host memory, memory of one of the devices, or the memory they have combined?

  I read some articles about saying that in OpenCL 1.2 if I use a context containing multiple GPU-s, they can reach each other's data in the kernels. So if the kernel running on the first device modifies the buffer, the second device instantly sees the modification. Which is important to us is that if we have a common buffer, and the first device calculates randomly some data in the buffer, and the second gpu does this too with the same kernel, after they are done, but before the next kernel call, they should have the same current data in a buffer, it doesnt matter if its really in each GPU-s memory, or they accessing each other memory through PCI. What will determine the maximum global memory size in this case? Will it work on Radeon cards, or its just FirePro feature?

Thanks!

0 Likes
1 Solution

Hi

Actually, you can get most of your answers from OpenCL spec. only. Each vendor may implement it different ways but all of them follow the spec., though there may be some limitations at certain point.

As per clSVMAlloc :

"Allocates a shared virtual memory (SVM) buffer that can be shared by the host and all devices in an OpenCL context that support shared virtual memory. "

It fails to allocates memory when size parameter is 0 or greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE value for any device in context.

Before using the SVM to share data among multiple devices, please read the section "5.6.1 SVM sharing granularity: coarse- and fine- grained sharing" from OpenCL 2.0 spec. It is very important to understand the granularity (or level of memory consistency) and synchronization points (or visibility of the updates) of the SVM objects. Hope you'll get a clear idea from that section.

If you've any further doubt/question, please share with us. We'll try our best to answer them.

[Edited]

Note: You may also refer  SVM related FAQs from here http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_APP_SDK_FAQ2.pdf

Regards,

View solution in original post

0 Likes
4 Replies
nibal
Challenger

I can answer you only about the second part.

In ocl 1.2, you don't have shared common buffers. You need to explicitly call clEnqueueCopyBuffer between these 2 devices. There will be transfer across the PCI bus, but not across the host. clEnqueueCopyBuffer works only with devices in the same context.

HTH

0 Likes

Thank you for the answer.

Unfortunately that means, it cannot be done via OpenCL 1.2, because its not possible to share random data between devices(fast).

AMD SVN article says the following:

"OpenCL 2.0 removes this limitation: the host and OpenCL devices can share the same virtual address range, so you no longer need to copy buffers between devices. In other words, no keeping track of buffers and explicitly copying them across devices! Just use shared pointers."

But i fear they only mean one cpu and one gpu, then its misleading a little, unfortunately i cant test it myself.

0 Likes

Hi

Actually, you can get most of your answers from OpenCL spec. only. Each vendor may implement it different ways but all of them follow the spec., though there may be some limitations at certain point.

As per clSVMAlloc :

"Allocates a shared virtual memory (SVM) buffer that can be shared by the host and all devices in an OpenCL context that support shared virtual memory. "

It fails to allocates memory when size parameter is 0 or greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE value for any device in context.

Before using the SVM to share data among multiple devices, please read the section "5.6.1 SVM sharing granularity: coarse- and fine- grained sharing" from OpenCL 2.0 spec. It is very important to understand the granularity (or level of memory consistency) and synchronization points (or visibility of the updates) of the SVM objects. Hope you'll get a clear idea from that section.

If you've any further doubt/question, please share with us. We'll try our best to answer them.

[Edited]

Note: You may also refer  SVM related FAQs from here http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_APP_SDK_FAQ2.pdf

Regards,

0 Likes

> Unfortunately that means, it cannot be done via OpenCL 1.2, because its not possible to share random data between devices(fast).

That's not necessarily true. clEnqueueCopyBuffer is fast, and the same transfer occurs over PCI for both devices. So, in the svm case, you get slower writes to the shared memory and you have to worry about synchronization. Overall time should be roughly the same.

However, if you want a common buffer, svm is the way to go.

0 Likes