Writing to a remote device using DirectGMA / CL_MEM_EXTERNAL_PHYSICAL_AMD

Discussion created by elad on Nov 13, 2018
Latest reply on Nov 18, 2018 by elad

Hi everybody, I have the following situation:


I have an AMD Radeon Pro WX 7100 running on a Windows 10 OS.

I can successfully utilize DirectGMA technology by allocating a buffer on the GPU, making the buffer resident using clEnqueueMakeBuffersResidentAMD, handing the bus_address to a 3rd party capture device, and have that device DMA directly to GPU memory.

Next, I try to make the GPU write directly to an FPGA. The FPGA maps a memory region to a PCIe BAR, I can obtain the backing physical address to that BAR from the FPGA driver.



// allocation stage

cl_bus_address_amd addr;

addr.surface_bus_address = remote_bus_address;

addr.marker_bus_address  = remote_bus_address;


cl_int create_buff_err = CL_SUCCESS;

cl_mem remote_buffer = clCreateBuffer(context, CL_MEM_EXTERNAL_PHYSICAL_AMD | CL_MEM_WRITE_ONLY , byteSize, &addr, &create_buff_err);

assert(create_buff_err == CL_SUCCESS);


What I see next puzzles me, clCreateBuffer is always successful. In fact, it is successful as long as 'remote_bus_address' is aligned to a page size (it can even be a random number), which is expected because there is no actual allocation being done. Yet, when trying to copy content to the returned cl_mem, I always get  CL_MEM_OBJECT_ALLOCATION_FAILURE failure.
I would expect the opencl driver to copy the data "no questions asked" (maybe cause a blue screen on the way), yet I get an allocation failure.


can anyone explain this? How can I tell why these functions failed?



cl_int err = clEnqueueWriteBuffer(queue.get(), remote_buffer, CL_TRUE, 0, vec.size() * sizeof(uint32_t),vec.data(), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE

cl_int err = clEnqueueCopyBuffer(queue.get(), deviceVec.get_buffer().get(), remote_buffer, 0,0, vec.size() * sizeof(uint32_t), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE