Archives Discussions

chardson · ‎06-20-2012

Assume a system with two distinct GPUs, but from the same vendor so they can be accessed from a single OpenCL Platform. Given the following simplified OpenCL code:

float* someRawData; 
cl_device_id gpu1 = clGetDeviceIDs(0,...);
cl_device_id gpu2 = clGetDeviceIDs(1,...);
cl_context ctx = clCreateContext(gpu1,gpu2,...); 
cl_command_queue queue1 = clCreateCommandQueue(ctx,gpu1,...);
cl_command_queue queue2 = clCreateCommandQueue(ctx,gpu2,...);
cl_mem gpuMem = clCreateBuffer(ctx, CL_MEM_READ_WRITE, ...);
clEnqueueWriteBuffer(queue1,gpuMem,...,someRawData,...);
clFlush(queue1);

At the end of the execution, will someRawData be on both GPU memory or will it be only on gpu1 memory?

pesh · ‎06-21-2012

nathan1986 is rigth.

Technically, OpenCL is developed the way that you don't need to worry about real object location. Practically, memory objects are implicitly transfered to device which command queue is starting to execute command that need this memory object as input. So memory object is automaticly synchronized between devices of CL Context by runtime. You can explicitly transfer memory object to particular device (command queue) using clEnqueueMigrateMemObjects for example to improve performance by preloading object to device that will execute command that uses it in future.

In your example memory object will reside in gpu1 device associated with queue1 (because you execute write command in queue1). But if you then will execute command that will use this buffer in queue2, buffer will be implicitly transfered to gpu2 associated with queue2 before executing command.

View solution in original post

nou · ‎06-20-2012

OpenCL will copy buffers data between GPUs if you make sure that kernels calls are properly synchronized with events. that mean you pass event from queue A to kernel invocation on queue B.

chardson · ‎06-20-2012

But will events be need even if EnqueueNDRange only comes after clFlush, such as in the code I put?

nathan1986 · ‎06-20-2012

As far as I know, the memory resides in only one gpu memory, it will have a implicit migration to another device when another device need it(of cause, it costs some time) . you can read the opencl 1.2 specification about the clEnqueueMigrateMemObjects API, or learn the usage in the sample of device fission in sdk 2.7. usually, we can do this migration explicitly by using migration API(for opencl 1.1, you can use AMD extension.)

pesh · ‎06-21-2012

nathan1986 is rigth.

Technically, OpenCL is developed the way that you don't need to worry about real object location. Practically, memory objects are implicitly transfered to device which command queue is starting to execute command that need this memory object as input. So memory object is automaticly synchronized between devices of CL Context by runtime. You can explicitly transfer memory object to particular device (command queue) using clEnqueueMigrateMemObjects for example to improve performance by preloading object to device that will execute command that uses it in future.

In your example memory object will reside in gpu1 device associated with queue1 (because you execute write command in queue1). But if you then will execute command that will use this buffer in queue2, buffer will be implicitly transfered to gpu2 associated with queue2 before executing command.

Archives Discussions

Read/Write OpenCL memory buffers on multiple GPU in a single context