Showing results for 
Search instead for 
Did you mean: 


Journeyman III

DirectGMA to copy buffers between 1 GPU to another

I already use DirectGMA to upload from a frame grabber (Matrox) to W8100 GPU.

It is done by:

1. clCreateBuffer with CL_MEM_BUS_ADDRESSABLE_AMD

2. clEnqueueMakeBuffersResidentAMD which outputs

    a.  cl_mem buffer

    b. { cl_ulong surface_bus_address;  cl_ulong marker_bus_address;  } cl_bus_address_amd;

Then the frame grabber is given the surface_bus_address, and output there directly. (somehow...)

Now I want to use the same technique to copy from 1 W8100 GPU to another. I considered the following:

Option 1:

* Create cl buffer and make it resident (1 and 2 above) on BOTH source and target

* clEnqueueCopyBuffer between source and target.

(would that copy use command queue created on the target device or on the source device?)

Option 2:

* Create cl buffer and make it resident (1 and 2 above ONLY ON THE TARGET)

* clEnqueueCopyBuffer from a non resident source cl-buffer to the TARGET.

Option 3:

* Same as option 1 but use memcpy(surface_bus_address_TARGET, surface_bus_address_SRC)

(instead of clEnqueueCopyBuffer)


The copy operation is done on a dedicated thread in a synchronous manner inside it, so I do not care about a "marker" or another way of synchronization.

Your help would be appreciated.


1 Reply
Journeyman III

UPDATE (answering to myself...):

I succeeded to implement it in the following way:

1. On a "Remote" GPU, create an "Addressable" buffer:

    a.  clCreateBuffer with CL_MEM_BUS_ADDRESSABLE_AMD

    b. clEnqueueMakeBuffersResidentAMD which outputs a "Physical address" (cl_bus_address_amd)

2. On a "Local" GPU, create an "External" buffer:

    a.  clCreateBuffer with CL_MEM_EXTERNAL_PHYSICAL_AMD and the "Physical address"

    b. clEnqueueMigrateMemObjects to the buffer

Then I use clEnqueueCopyBuffer on the Local GPU, from the external buffer, to any local buffer. And it copies the content of the "Addressable" buffer which actually resides on the other ("Remote") GPU.

I guess I can even run a kernel on the "Local" GPU with the "External" buffer as a parameter, and that would implicitly read/write in the "Remote" GPU's "Addressable" buffer. Right?

Is every memory access literally external?

Or is there a DMA block transfer when needed?

Am I missing something?