Is GPU_ASYNC_MEM_COPY=2 still available in SDK 2.7?

I enqueue a kernel and a buffer-reading at the same time on different queues, but they are executed one after another. I am supposed that they could be done parallel.


the code is like that:

// queue: Q0, Q1

// kernel: vnd, cnd

for(;;) {

    clEnqueueNDRangeKernel(Q0, vnd, ... , event0); // vnd produce an event0

    clEnqueueNDRangeKernel(Q0, cnd, ...); // cnd run after vnd in the same queue

    clEnqueueReadBuffer(Q1, ..., 1, event0, NULL); // buffer-reading wait for vnd(event0) to complete

    // wait for buffer-reading to complete

    // ...



I am supposed that cnd and buffer-reading could be executed together on GPU, and the execution sequence is:

... -> vnd -> (cnd/buffer-reading) -> vnd -> ...

but in fact they are done serially.



my environment:

AMD APP SDK 2.7 + Catalyst 12.6 Beta (8.98-120522a-139735E-ATI)

HD 7970