Is GPU_ASYNC_MEM_COPY=2 still available in 2.7?
I enqueue a kernel and a buffer-reading at the same time on different queues, but they are executed one after another. I am supposed that they could be done parallel.
the code is like that:
// queue: Q0, Q1
// kernel: vnd, cnd
clEnqueueNDRangeKernel(Q0, vnd, ... , event0); // vnd produce an event0
clEnqueueNDRangeKernel(Q0, cnd, ...); // cnd run after vnd in the same queue
clEnqueueReadBuffer(Q1, ..., 1, event0, NULL); // buffer-reading wait for vnd(event0) to complete
// wait for buffer-reading to complete
I am supposed that cnd and buffer-reading could be executed together on GPU, and the execution sequence is:
... -> vnd -> (cnd/buffer-reading) -> vnd -> ...
but in fact they are done serially.
AMD APP SDK 2.7 + Catalyst 12.6 Beta (8.98-120522a-139735E-ATI)