AnsweredAssumed Answered

Overlapping computation and memory transfers with OpenACC and the CAPS compiler

Question asked by monoton on Jan 31, 2014
Latest reply on Feb 18, 2014 by prao



I am currently working on overlapping my memory transfers (each transfer about 1 GB in size) with computation.

However, even when I use async memory transfers in OpenACC the profiler shows me that the OpenCL command queue runs in-line, blocking all other commands until the transfer is done. So I cannot do the computation concurrently (on another set of data previously brought into memory).

Is there a way to change to out-of-order? And would that resolve the issue? If not, how can I resolve the issue? I cannot fetch the command queue as that is a openacc 2.0 feature that is not yet implemented in the latest compiler. But even if I could, I am not sure if it is supported to do out-of-order.


Is there a way to set the default to out-of-order (preferably a environmental variable or something alike)? Is it supported by the GPU/runtime/SDK?


Is there another way to overlap the compute and memory transfers, if the above is not possible?




Best regards,