cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

matthiasv
Journeyman III

AMDs bus-addressable memory extension for OpenCL

Hi, I've got questions related to AMDs bus-addressable memory extension for OpenCL. We can do fast bi-directional data transfers with our custom FPGA board but the signalling is not yet working as it's supposed to be.

Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. What we found out is, that clWaitForSignalAMD does not wait at all. No matter which value we use to wait for, the associated event finishes after some several hundred microseconds.

Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

0 Likes
6 Replies

Hi and welcome! You have been white-listed and this thread moved into the OpenCL forum.

0 Likes
dipak
Big Boss

Hi,

Could you please be more explicit about the problem?

Regards,

0 Likes

The problem simply is, that it is not working but that might be just from some misunderstanding on our side.

Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct? Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

0 Likes

matthiasv wrote:

The problem simply is, that it is not working but that might be just from some misunderstanding on our side.

Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct? Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

Hmmm. I am asumming that this is communication between 2 ocl devices. Otherwise you should probably ask FPGA support. There are 2 possible ways to communicate between 2 devices with ocl:

1) ocl 1.x: Create a context with both devices in it. Use clEnqueueCopybuffer between those devices.

2) ocl 2.0: Use clSVMalloc and clEnqueueSVMMemcopy to transfer data between those 2 devices.

Most if not all of ocl commands run either as blocking I/O, or event-driven asynchronous I/O, so there is no way to miss completion from your code.

You don't use explicit raw addressing anywhere...

HTH

Nikos

0 Likes

nibal wrote:

Hmmm. I am asumming that this is communication between 2 ocl devices. Otherwise you should probably ask FPGA support.

We communicate using (as stated in the topic) bus-addressable memory following this​ specification.

nibal wrote:

There are 2 possible ways to communicate between 2 devices with ocl:

1) ocl 1.x: Create a context with both devices in it. Use clEnqueueCopybuffer between those devices.

2) ocl 2.0: Use clSVMalloc and clEnqueueSVMMemcopy to transfer data between those 2 devices.

3) create a buffer with the CL_MEM_BUS_ADDRESSABLE_AMD flag and retrieve the physical bus addresses.

nibal wrote:

Most if not all of ocl commands run either as blocking I/O, or event-driven asynchronous I/O, so there is no way to miss completion from your code.

Yes it is possible, because the FPGA initiates the data transfer via DMA, thus all the talk about marker synchronization.

0 Likes

let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct?

Yes, you are right. For example (taken from http://toronto.siggraph.org/wp-content/uploads/2015/05/uoft-ocl.pdf ),

Let,

Bus addressable memory (receiving GPU) - busAddressableBuff_

External physical memory (sending GPU) – extPhysicalBuff_

busAddressableBuff_ = clCreateBuffer(contexts_[0], CL_MEM_BUS_ADDRESSABLE_AMD, bufSize_, 0, &error_);

// Get physical address

error_ |= clEnqueueMakeBuffersResidentAMD(cmd_queues_[0], 1, &busAddressableBuff_, true, &busAddr_, 0, 0, 0);

// Allocate external physical memory (simple mapping of physical memory from GPU0)

extPhysicalBuff_ = clCreateBuffer(contexts_[1], CL_MEM_EXTERNAL_PHYSICAL_AMD, bufSize_, &busAddr_, &error_);

error_ |= clEnqueueCopyBuffer(cmd_queues_[1], srcBuff_, extPhysicalBuff_, 0, 0, bufSize_, 0, NULL, NULL);

error_ |= clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0);

error_ |= clFlush(cmd_queues_[1]);

// Wait for copy operation on the first GPU

error_ |= clEnqueueWaitSignalAMD(cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0);

Second, do the marker values have to start at 0?

I haven't used it myself, so, I'm not sure either. However, I don't think so. I'll check with related folks.

Regards,

0 Likes