My understanding is that buffers will be allocated on all devices connected to the same context. The reason for this is likely to simplify runtime management, as otherwise any function involving buffers could fail with out of memory errors during execution.(Say if you suddenly used the buffer in GPU1 in your case)
So although I haven't tried it myself, I'd guess creating separate contexts for the two GPUs is the way to go for you.
i recomend read this thread. as i understand that RAM on the GPU shoul be just like big cache.
http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2706
but current implementation create each buffer on all devices in context.
Originally posted by: nou
i recomend read this thread. as i understand that RAM on the GPU shoul be just like big cache.
http://www.khronos.org/message...c.php?f=28&t=2706
but current implementation create each buffer on all devices in context.
Originally posted by: Raistmer Thanks, interesting reading (I passed only first page though ). And I still don't understand for what such design was used: to create buffer on all devices but use it only for specific queue bound to particular device. If buffer needed on many devices - create it on many queues and use it on many queues, what can be easier ?...
Just to clear up a small issue here... buffers are not "created on queues", they are created in a context.( cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret) )
A context is a grouping construct that's there to allow you to share data/code/workloads between devices in an easy and/or automatic manner. If I have a context containing a CPU and a GPU device I can do a single clBuild to compile code for all devices. Buffers allocated in this context will also be trivially passable to functions on both devices because the buffer is represented on all devices(Not necessarily as a complete copy taking up ram, it could just be mapped). Filling in data in the buffer to multiple devices from the host is also automatic.
If you don't want cooperation between devices, then you don't want a context containing more than a single device. In effect, in creating a context with several devices amounts to you informing the OpenCL runtime that you are planning to use the buffers allocated in the context, and code compiled into the context, on several devices.
Command queues, on the other hand, provides you with just that - a mechanism to issue commands to a device. You can even have several of these to the same device to aid you in thread synchronization - and they have nothing to do with memory management.
Originally posted by: Raistmer But how to limit context only to single GPU then ? I use only one device for creation of command queue, all buffer allocations go into same command queue so runtime should know that only particular device used...
You can query the individual device Ids in your app and use clCreateContext method on each device to create separate contexts for your devices.
A command-queue specific to a single device only so you can do independent operations on each command-queue.
Originally posted by: n0thing Originally posted by: Raistmer But how to limit context only to single GPU then ? I use only one device for creation of command queue, all buffer allocations go into same command queue so runtime should know that only particular device used...
You can query the individual device Ids in your app and use clCreateContext method on each device to create separate contexts for your devices.
A command-queue specific to a single device only so you can do independent operations on each command-queue.
1. try pass 0 as num_devices
2. yes you can use same devices as in creation of context. but not other.