I've seen in another thread about how to data transfers and execution. Basically, you create two command queues, issue copies and kernels into both queues, and flush them both. When you set GPU_ASYNC_MEM_COPY=2, you should simultaneously see execution and transfers (one from each queue). What are the exact requirements to make this work, as I'm not seeing this happen?