cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

sandyandr
Adept I

Image/buffer location (on which GPU?)

Could somebody explain me one thing, that I don't understand:

Let's suppose I made a single OpenCL context for several GPUs.

1) I should make several queues in it, at least one of which has to be bound to a separate device (so, I still need to bother with devices anyway),

2) To create buffers/images I should provide only context - how will OpenCL runtime know where (on which GPU) to store them (in a case it's not a host memory)?

What the benefit of having the common context at all?

0 Likes
1 Solution
nibal
Challenger

The benefit of a common context, is to transfer data between all declared devices. Otherwise you would need expensive host memory copies.

As far as (2) goes, my guess is that buffer creation is opportunistic, deferred to first access. So whatever queue you use to transfer it, that's where it will be created.

View solution in original post

0 Likes
5 Replies
nibal
Challenger

The benefit of a common context, is to transfer data between all declared devices. Otherwise you would need expensive host memory copies.

As far as (2) goes, my guess is that buffer creation is opportunistic, deferred to first access. So whatever queue you use to transfer it, that's where it will be created.

0 Likes

So, it seems clEnqueueCopy... functions is the only advantage here, right? Anyway, it's hardly possible, that the kernel from one GPU can seamlessly access data, residing on the other one - am I right? Or it's possible? Your guess is interesting BTW, but what should runtime decide in a case of clCreate...(..., CL_MEM_COPY_HOST_PTR, ...)?

0 Likes

Well, that's the fastest way to transfer data between devices. And they have to be in the same context. And yes, in that case kernel from one device could access data from another device, not seamlessly, it would still have to go through the pci bus, but avoiding the host.

The same exact mechanism could be used with CL_MEM_COPY_HOST_PTR. Creation and initialization are deferred until first access. In this case first access should be slower than in all other cases.

Since you already have the setup ready, why don't you try it?

0 Likes

OpenCL Optimization Guide fully supports your statements. CL_MEM_COPY_HOST_PTR even forces the runtime to allocate some temporary storage for data - it was a surprise: all clCreate... seemed to me as immediate commands until today. Thank you for this clarification.

0 Likes

Thanks for the update.

I remembered having read it somewhere, just couldn't remember where...:)

0 Likes