4 Replies Latest reply on Jun 21, 2012 9:10 AM by pesh

    Read/Write OpenCL memory buffers on multiple GPU in a single context

    chardson

      Assume a system with two distinct GPUs, but from the same vendor so they can be accessed from a single OpenCL Platform. Given the following simplified OpenCL code:

       

      float* someRawData; 
      
      cl_device_id gpu1 = clGetDeviceIDs(0,...);
      cl_device_id gpu2 = clGetDeviceIDs(1,...);
      cl_context ctx = clCreateContext(gpu1,gpu2,...); 
      
      cl_command_queue queue1 = clCreateCommandQueue(ctx,gpu1,...);
      cl_command_queue queue2 = clCreateCommandQueue(ctx,gpu2,...);
      
      cl_mem gpuMem = clCreateBuffer(ctx, CL_MEM_READ_WRITE, ...);
      clEnqueueWriteBuffer(queue1,gpuMem,...,someRawData,...);
      clFlush(queue1);
      

       

      At the end of the execution, will someRawData be on both GPU memory or will it be only on gpu1 memory?

        • Read/Write OpenCL memory buffers on multiple GPU in a single context
          nou

          OpenCL will copy buffers data between GPUs if you make sure that kernels calls are properly synchronized with events. that mean you pass event from queue A to kernel invocation on queue B.

          • Re: Read/Write OpenCL memory buffers on multiple GPU in a single context
            nathan1986

                As far as I know, the memory resides in only one gpu memory, it will have a  implicit migration to another device when another device need it(of cause, it costs some time) . you can read the opencl 1.2 specification about the clEnqueueMigrateMemObjects API, or learn the usage in the sample of device fission in sdk 2.7. usually, we can do this migration explicitly by using migration API(for opencl 1.1, you can use AMD extension.)

            1 of 1 people found this helpful
            • Re: Read/Write OpenCL memory buffers on multiple GPU in a single context
              pesh

              nathan1986 is rigth.

              Technically, OpenCL is developed the way that you don't need to worry about real object location. Practically, memory objects are implicitly transfered to device which command queue is starting to execute command that need this memory object as input. So memory object is automaticly synchronized between devices of CL Context by runtime. You can explicitly transfer memory object to particular device (command queue) using clEnqueueMigrateMemObjects for example to improve performance by preloading object to device that will execute command that uses it in future.

              In your example memory object will reside in gpu1 device associated with queue1 (because you execute write command in queue1). But if you then will execute command that will use this buffer in queue2, buffer will be implicitly transfered to gpu2 associated with queue2 before executing command.