6 Replies Latest reply on Jun 18, 2015 10:57 AM by evoliptic

    newbie question on DirectGMA without high level languages and with memory map

    evoliptic

      hello everybody,

       

      i was asked to search about the following : we currently have some software running on an fpga board(xilinx virtex) that treat some data using the fpga to further send it to some graphic card for processing. In fact, using directGMA here could help us get a big performance gain. However, from what i've read and the directgma sdk you provide, the way to make directGMA (using openCL) is by allocating specials buffers that would be directly accessible further using special functions. But with the xilinx board, we use DMA engines directly using memory map, and no high level language like openCl at all...

       

      So i would like to know if it's possible to somehow initiate the graphic card, then get the adresses and map them, so that we can use them directly for further use by DMA engines of the xilinx board? Is it possible also to maybe make it without the use of language like opencl to initiate the graphic card?

       

      thanks in advance for any responses,

      nicolas

        • Re: newbie question on DirectGMA without high level languages and with memory map
          evoliptic

          hello guys,

           

          regarding this issue, i was wondering : what would be the adress returned by a clEnqueueMapBuffer on a buffer allocated with AMD_BUS_ADRESSABLE (the real adress or one adress we don't know, as the amd driver put a layer of abstraction?) ? and a call to clSVMalloc?

           

          thanks in advance!

          • Re: newbie question on DirectGMA without high level languages and with memory map
            chm

            Hi,

            you should not have any problems writing with your DMA into the GPU memory. The OpenCL interface is used on the GPU to create buffers that can be accessed by other devices on the bus but those devices do not need to support OpenCL.

             

            You can create a buffer in the PCIE aperture that is visible to your FPGA by calling:

            clCreateBuffer(m_clCtx, CL_MEM_BUS_ADDRESSABLE_AMD, ...

             

            Now you can retrieve the physical address of this buffer by calling:

            clEnqueueMakeBuffersResidentAMD(...., m_pBusAddresses, 0, 0, 0);

             

            m_pBusAddresses will contain the physical addresses of your buffers. Those addresses can be used by the driver of your FPGA to write into GPU memory.

             

            Chris

              • Re: newbie question on DirectGMA without high level languages and with memory map
                evoliptic

                hello chris,

                 

                thanks for your answer. In fact, that was i was thinking of that , without being sure about it.

                 

                I ended up writing that piece of code that might interest some people who do not want to write in C++ but in C :

                 

                #include <string.h>
                #include <stdio.h>
                #include "CL/cl.h"
                #include "CL/cl_ext.h"
                
                #define DATA_SIZE 10
                
                
                #define CL_CHECK_STATUS(status) \
                        if(status != CL_SUCCESS) { \
                                    printf("Status error %d \n",status); \
                                    return 1; \
                                }
                
                int main(void)
                {
                    cl_context context;
                    cl_context_properties properties[3];
                    cl_kernel kernel;
                    cl_command_queue command_queue;
                    cl_program program;
                    cl_int err;
                    cl_uint num_of_platforms=0;
                    cl_platform_id platform_id;
                    cl_device_id device_id;
                    cl_uint num_of_devices=0;
                    cl_mem input, output, host;
                    size_t global;
                
                    clEnqueueMakeBuffersResidentAMD_fn   clEnqueueMakeBuffersResidentAMD;
                    if (clGetPlatformIDs(1, &platform_id, &num_of_platforms)!= CL_SUCCESS)
                            {
                                       printf("Unable to get platform_id\n");
                                               return 1;
                            }
                    printf("platfor id : %i\n",platform_id);
                
                    // try to get a supported GPU device
                    err= clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id,&num_of_devices);
                    CL_CHECK_STATUS(err);
                    
                    printf("device id : %i\n",device_id);
                    
                    properties[0]= CL_CONTEXT_PLATFORM;
                    properties[1]= (cl_context_properties) platform_id;
                    properties[2]= 0;
                    
                    context = clCreateContext(properties,1,&device_id,NULL,NULL,&err);
                    if(err!=CL_SUCCESS) printf("bouh\n");
                    CL_CHECK_STATUS(err);
                    command_queue = clCreateCommandQueue(context, device_id, 0, &err);
                    CL_CHECK_STATUS(err);
                    input = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_BUS_ADDRESSABLE_AMD,sizeof(float) * DATA_SIZE, NULL, &err);
                    printf("err: %i\n",err);
                
                        cl_bus_address_amd busadress;
                           memset(&busadress, 1 , sizeof(cl_bus_address_amd));
                    
                        if((clEnqueueMakeBuffersResidentAMD_fn)clGetExtensionFunctionAddressForPlatform(platform_id, "clEnqueueMakeBuffersResidentAMD")!=NULL) printf("ok\n");
                
                    clEnqueueMakeBuffersResidentAMD = (clEnqueueMakeBuffersResidentAMD_fn)clGetExtensionFunctionAddressForPlatform(platform_id, "clEnqueueMakeBuffersResidentAMD");
                                clEnqueueMakeBuffersResidentAMD(command_queue,1 ,&input , CL_FALSE, &busadress, 0,0,0);
                
                                 printf("bus adress : surface : %ld, marker : %ld\n",busadress.surface_bus_address, busadress.marker_bus_address);
                
                        clReleaseMemObject(input);
                        clReleaseCommandQueue(command_queue);
                        clReleaseContext(context);
                }
                

                 

                the problem i got with this code is that it returned me only 0 in the adresses. In fact, I'm not sure about the function pointer "clEnqueueMakeBuffersResidentAMD_fn" and the use i make of it here. Could you say to me if by you, this piece of code works?

                 

                However, the issue could have been the fact i have a bad installation about amd drivers as "fglrxrinfo" command returns "unable to open display(null)", and so that glxinfo returns the same, leading me to not know if the extension amd_bus_adressable was correctly installed using

                 

                aticonfig --set-pcs-val=MCIL,DMAOGLExtensionApertureMB,96
                aticonfig --set-pcs-u32=KERNEL,InitialPhysicalUswcUsageSize,96
                

                 

                i had to switch to more urgent work, but i will let you aware of it.

                 

                thanks again for your really clear answer chris.

                 

                best regards,

                • Re: newbie question on DirectGMA without high level languages and with memory map
                  evoliptic

                  Hello,

                   

                  I have managed to get a filled cl_bus_address_amd with values that seems correct (the two values are in the range of the adresses that i can obser in /sys/device/pci/*/resource).

                   

                  I have so now a marker address and a surface address, but how should i do the translation between this structure and my memory, to get the corresponding pages of this buffer?

                   

                   

                  thanks in advance,

                  nicolas