Could somebody explain me one thing, that I don't understand:
Let's suppose I made a single OpenCL context for several GPUs.
1) I should make several queues in it, at least one of which has to be bound to a separate device (so, I still need to bother with devices anyway),
2) To create buffers/images I should provide only context - how will OpenCL runtime know where (on which GPU) to store them (in a case it's not a host memory)?
What the benefit of having the common context at all?
Solved! Go to Solution.
The benefit of a common context, is to transfer data between all declared devices. Otherwise you would need expensive host memory copies.
As far as (2) goes, my guess is that buffer creation is opportunistic, deferred to first access. So whatever queue you use to transfer it, that's where it will be created.
The benefit of a common context, is to transfer data between all declared devices. Otherwise you would need expensive host memory copies.
As far as (2) goes, my guess is that buffer creation is opportunistic, deferred to first access. So whatever queue you use to transfer it, that's where it will be created.
So, it seems clEnqueueCopy... functions is the only advantage here, right? Anyway, it's hardly possible, that the kernel from one GPU can seamlessly access data, residing on the other one - am I right? Or it's possible? Your guess is interesting BTW, but what should runtime decide in a case of clCreate...(..., CL_MEM_COPY_HOST_PTR, ...)?
Well, that's the fastest way to transfer data between devices. And they have to be in the same context. And yes, in that case kernel from one device could access data from another device, not seamlessly, it would still have to go through the pci bus, but avoiding the host.
The same exact mechanism could be used with CL_MEM_COPY_HOST_PTR. Creation and initialization are deferred until first access. In this case first access should be slower than in all other cases.
Since you already have the setup ready, why don't you try it?
OpenCL Optimization Guide fully supports your statements. CL_MEM_COPY_HOST_PTR even forces the runtime to allocate some temporary storage for data - it was a surprise: all clCreate... seemed to me as immediate commands until today. Thank you for this clarification.
Thanks for the update.
I remembered having read it somewhere, just couldn't remember where...:)