cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

probing
Journeyman III

How can I make memory transfer and GPU computing concurrent

my current implementation is that create 2 queue for a single GPU device,

one queue is only for memory transfer API such as:

clEnqueueReadBuffer, clEnqueueWriteBuffer or clEnqueueCopyBuffer.

another queue is only for GPU computing API such as clEnqueueNDRangeKernel

and I sync these 2 queues using shared cl_event objects when it is necessary.

But for my test, this can not make transfer and computing concurrent. does it mean that clEnqueueCopyBuffer and clEnqueueNDRangeKernel will execute serially even they are on different queues?

 

0 Likes
11 Replies
omkaranathan
Adept I

Concurrent memory transfer and kernel execution can happen in case of single queue also. For this to happen DMA should be enabled, which not the case with current implementation.

 

0 Likes

Thanks, is there any plan (in some future version?) of DMA enabling?

Originally posted by: omkaranathan Concurrent memory transfer and kernel execution can happen in case of single queue also. For this to happen DMA should be enabled, which not the case with current implementation.

 

 

 

0 Likes

Yes, we are working on it.

0 Likes

Originally posted by: omkaranathan Concurrent memory transfer and kernel execution can happen in case of single queue also. For this to happen DMA should be enabled, which not the case with current implementation.

 

So AMD's OpenCL compiler doesn't allow for async transfer? Odd.

0 Likes

ryta,
I don't he is refering to the async_copy functions but EnqueueRead/Write buffer.
0 Likes

Micah,

From the little OpenCL I've done I though that unless you waited on these routines that they would be async? I assume this is not correct, thanks, good to know if I want to try and pipeline my data/execution.

0 Likes

Originally posted by: MicahVillmow ryta, I don't he is refering to the async_copy functions but EnqueueRead/Write buffer.


Micah, you mean command of clEnqueueCopyBuffer can run concurrently with command of clEnqueueNDRangeKernel? while clEnqueueReadBuffer can not?

Thanks

0 Likes

On architectures that support async data copies, they will be asynchronous, otherwise they will not be.
7XX and Evergreen hardware does not support async kernel copies.
Micah
0 Likes

Originally posted by: MicahVillmow On architectures that support async data copies, they will be asynchronous, otherwise they will not be. 7XX and Evergreen hardware does not support async kernel copies. Micah


Interesting, this means essentially in all your "current" hardware that there exists no way to do an async data transfer, so no "pipelining" can occcur?

I thought this was possible in CAL (though I've never tried it) through DMA?

I'm confused, sorry, lol.

0 Likes

ryta,
I"m talking about the kernel functions async_copy_*, not the the OpenCL API calls for Enqueue commands. That being done asynchronously is possible
0 Likes

Ok, thought so, thanks.

 

0 Likes