6 Replies Latest reply on Nov 4, 2015 3:10 PM by dipak

    AMDs bus-addressable memory extension for OpenCL

    matthiasv

      Hi, I've got questions related to AMDs bus-addressable memory extension for OpenCL. We can do fast bi-directional data transfers with our custom FPGA board but the signalling is not yet working as it's supposed to be.

       

      Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. What we found out is, that clWaitForSignalAMD does not wait at all. No matter which value we use to wait for, the associated event finishes after some several hundred microseconds.

       

      Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

        • Re: Whitelist request
          deedeeyelverton

          Hi and welcome! You have been white-listed and this thread moved into the OpenCL forum.

          • Re: AMDs bus-addressable memory extension for OpenCL; was - Whitelist request
            dipak

            Hi,

            Could you please be more explicit about the problem?

             

            Regards,

              • Re: AMDs bus-addressable memory extension for OpenCL; was - Whitelist request
                matthiasv

                The problem simply is, that it is not working but that might be just from some misunderstanding on our side.

                 

                Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct? Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

                  • Re: AMDs bus-addressable memory extension for OpenCL; was - Whitelist request
                    nibal

                    matthiasv wrote:

                     

                    The problem simply is, that it is not working but that might be just from some misunderstanding on our side.

                     

                    Our scenario is as follows: FPGA writes into GPU memory by passing the surface address that we got from making a GPU buffer resident. As far as I understood, we can also pass the marker address to the FPGA, let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct? Second, do the marker values have to start at 0? The spec only says that they have to increase monotonically.

                    Hmmm. I am asumming that this is communication between 2 ocl devices. Otherwise you should probably ask FPGA support. There are 2 possible ways to communicate between 2 devices with ocl:

                     

                    1) ocl 1.x: Create a context with both devices in it. Use clEnqueueCopybuffer between those devices.

                    2) ocl 2.0: Use clSVMalloc and clEnqueueSVMMemcopy to transfer data between those 2 devices.

                     

                    Most if not all of ocl commands run either as blocking I/O, or event-driven asynchronous I/O, so there is no way to miss completion from your code.

                    You don't use explicit raw addressing anywhere...

                     

                    HTH

                    Nikos

                      • Re: AMDs bus-addressable memory extension for OpenCL; was - Whitelist request
                        matthiasv

                        nibal wrote:

                         

                        Hmmm. I am asumming that this is communication between 2 ocl devices. Otherwise you should probably ask FPGA support.

                         

                        We communicate using (as stated in the topic) bus-addressable memory following this specification.

                         

                        nibal wrote:

                         

                        There are 2 possible ways to communicate between 2 devices with ocl:

                         

                        1) ocl 1.x: Create a context with both devices in it. Use clEnqueueCopybuffer between those devices.

                        2) ocl 2.0: Use clSVMalloc and clEnqueueSVMMemcopy to transfer data between those 2 devices.

                         

                        3) create a buffer with the CL_MEM_BUS_ADDRESSABLE_AMD flag and retrieve the physical bus addresses.

                        nibal wrote:

                         

                        Most if not all of ocl commands run either as blocking I/O, or event-driven asynchronous I/O, so there is no way to miss completion from your code.

                        Yes it is possible, because the FPGA initiates the data transfer via DMA, thus all the talk about marker synchronization.

                      • Re: AMDs bus-addressable memory extension for OpenCL; was - Whitelist request
                        dipak

                         

                        let it update the value and the OpenCL run-time should be able to detect that writing finished. Is this assumption correct?

                         

                        Yes, you are right. For example (taken from http://toronto.siggraph.org/wp-content/uploads/2015/05/uoft-ocl.pdf ),

                         

                        Let,

                        Bus addressable memory (receiving GPU) - busAddressableBuff_

                        External physical memory (sending GPU) – extPhysicalBuff_

                         

                        busAddressableBuff_ = clCreateBuffer(contexts_[0], CL_MEM_BUS_ADDRESSABLE_AMD, bufSize_, 0, &error_);

                        // Get physical address

                        error_ |= clEnqueueMakeBuffersResidentAMD(cmd_queues_[0], 1, &busAddressableBuff_, true, &busAddr_, 0, 0, 0);

                         

                        // Allocate external physical memory (simple mapping of physical memory from GPU0)

                        extPhysicalBuff_ = clCreateBuffer(contexts_[1], CL_MEM_EXTERNAL_PHYSICAL_AMD, bufSize_, &busAddr_, &error_);

                        error_ |= clEnqueueCopyBuffer(cmd_queues_[1], srcBuff_, extPhysicalBuff_, 0, 0, bufSize_, 0, NULL, NULL);

                        error_ |= clEnqueueWriteSignalAMD(cmd_queues_[1], extPhysicalBuff_, markerValue_, 0, 0, 0, 0);

                        error_ |= clFlush(cmd_queues_[1]);

                         

                        // Wait for copy operation on the first GPU

                        error_ |= clEnqueueWaitSignalAMD(cmd_queues_[0], busAddressableBuff_, markerValue_, 0, 0, 0);

                         

                        Second, do the marker values have to start at 0?

                        I haven't used it myself, so, I'm not sure either. However, I don't think so. I'll check with related folks.

                         

                         

                        Regards,