Hi everybody, I have the following situation:
I have an AMD Radeon Pro WX 7100 running on a Windows 10 OS.
I can successfully utilize DirectGMA technology by allocating a buffer on the GPU, making the buffer resident using clEnqueueMakeBuffersResidentAMD, handing the bus_address to a 3rd party capture device, and have that device DMA directly to GPU memory.
Next, I try to make the GPU write directly to an FPGA. The FPGA maps a memory region to a PCIe BAR, I can obtain the backing physical address to that BAR from the FPGA driver.
// allocation stage
addr.surface_bus_address = remote_bus_address;
addr.marker_bus_address = remote_bus_address;
cl_int create_buff_err = CL_SUCCESS;
cl_mem remote_buffer = clCreateBuffer(context, CL_MEM_EXTERNAL_PHYSICAL_AMD | CL_MEM_WRITE_ONLY , byteSize, &addr, &create_buff_err);
assert(create_buff_err == CL_SUCCESS);
What I see next puzzles me, clCreateBuffer is always successful. In fact, it is successful as long as 'remote_bus_address' is aligned to a page size (it can even be a random number), which is expected because there is no actual allocation being done. Yet, when trying to copy content to the returned cl_mem, I always get CL_MEM_OBJECT_ALLOCATION_FAILURE failure.
I would expect the opencl driver to copy the data "no questions asked" (maybe cause a blue screen on the way), yet I get an allocation failure.
can anyone explain this? How can I tell why these functions failed?
cl_int err = clEnqueueWriteBuffer(queue.get(), remote_buffer, CL_TRUE, 0, vec.size() * sizeof(uint32_t),vec.data(), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE
cl_int err = clEnqueueCopyBuffer(queue.get(), deviceVec.get_buffer().get(), remote_buffer, 0,0, vec.size() * sizeof(uint32_t), 0, nullptr, nullptr); // returns CL_MEM_OBJECT_ALLOCATION_FAILURE