2 Replies Latest reply on Feb 18, 2014 4:38 AM by prao

    Overlapping computation and memory transfers with OpenACC and the CAPS compiler




      I am currently working on overlapping my memory transfers (each transfer about 1 GB in size) with computation.

      However, even when I use async memory transfers in OpenACC the profiler shows me that the OpenCL command queue runs in-line, blocking all other commands until the transfer is done. So I cannot do the computation concurrently (on another set of data previously brought into memory).

      Is there a way to change to out-of-order? And would that resolve the issue? If not, how can I resolve the issue? I cannot fetch the command queue as that is a openacc 2.0 feature that is not yet implemented in the latest compiler. But even if I could, I am not sure if it is supported to do out-of-order.


      Is there a way to set the default to out-of-order (preferably a environmental variable or something alike)? Is it supported by the GPU/runtime/SDK?


      Is there another way to overlap the compute and memory transfers, if the above is not possible?




      Best regards,