6 Replies Latest reply on Nov 18, 2018 8:57 AM by elad

    Writing to a remote device using DirectGMA / CL_MEM_EXTERNAL_PHYSICAL_AMD


      Hi everybody, I have the following situation:


      I have an AMD Radeon Pro WX 7100 running on a Windows 10 OS.

      I can successfully utilize DirectGMA technology by allocating a buffer on the GPU, making the buffer resident using clEnqueueMakeBuffersResidentAMD, handing the bus_address to a 3rd party capture device, and have that device DMA directly to GPU memory.

      Next, I try to make the GPU write directly to an FPGA. The FPGA maps a memory region to a PCIe BAR, I can obtain the backing physical address to that BAR from the FPGA driver.



      // allocation stage

      cl_bus_address_amd addr;

      addr.surface_bus_address = remote_bus_address;

      addr.marker_bus_address  = remote_bus_address;


      cl_int create_buff_err = CL_SUCCESS;

      cl_mem remote_buffer = clCreateBuffer(context, CL_MEM_EXTERNAL_PHYSICAL_AMD | CL_MEM_WRITE_ONLY , byteSize, &addr, &create_buff_err);

      assert(create_buff_err == CL_SUCCESS);


      What I see next puzzles me, clCreateBuffer is always successful. In fact, it is successful as long as 'remote_bus_address' is aligned to a page size (it can even be a random number), which is expected because there is no actual allocation being done. Yet, when trying to copy content to the returned cl_mem, I always get  CL_MEM_OBJECT_ALLOCATION_FAILURE failure.
      I would expect the opencl driver to copy the data "no questions asked" (maybe cause a blue screen on the way), yet I get an allocation failure.


      can anyone explain this? How can I tell why these functions failed?



      cl_int err = clEnqueueWriteBuffer(queue.get(), remote_buffer, CL_TRUE, 0, vec.size() * sizeof(uint32_t),vec.data(), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE

      cl_int err = clEnqueueCopyBuffer(queue.get(), deviceVec.get_buffer().get(), remote_buffer, 0,0, vec.size() * sizeof(uint32_t), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE

        • Re: Writing to a remote device using DirectGMA / CL_MEM_EXTERNAL_PHYSICAL_AMD

          Just wanted to share couple of suggestions if they work.

          • Put a clEnqueueMigrateMemObjects call after creating the remote buffer and before using that buffer. For example:

          cl_mem remote_buffer = clCreateBuffer(context, CL_MEM_EXTERNAL_PHYSICAL_AMD, ..);

          clEnqueueMigrateMemObjects(queue, 1, &remote_buffer, 0, 0, NULL, NULL);

          clEnqueueCopyBuffer(queue, local_buffer, remote_buffer, ..);


          • Make sure surface_bus_address and marker_bus_address are set correctly
          1 of 1 people found this helpful
            • Re: Writing to a remote device using DirectGMA / CL_MEM_EXTERNAL_PHYSICAL_AMD

              Thanks for the reply,


              I tried your first suggestion, and it did not work. clEnqueueMigrateMemObjects also returned CL_MEM_OBJECT_ALLOCATION_FAILURE.

              As for your second suggestion, could you elaborate on how exactly I can make sure those addresses are set correctly?
              My understanding is that these addresses are what the I can see in the 'device manager' resources property. This is also the 'physical addresses'

              returned from my driver. when my driver loads, it map these addresses to my virtual address space and I can memcpy / clEnqueueReadBuffer to that address.




              Besides that, are there any additional steps my driver should perform in order to support this?