Hi Evren,
1. Yes, these two operations can be overlapped on certain hardwares. With DMA support, device can perform kernel execution while doing independent memory transfer operation. For more details, I would refer you to check AMD's OpenCL optimization guide.
2. AFAIK, on AMD platform, host-side queue works as in-order manner. However, certain devices have hardware support which can simultaneously handle multiple commands from multiple queues. Hence, one can use multiple command queues to enqueue many independent tasks to device at the same time.
Note: As per OpenCL spec, supporting out-of-order queue is not a mandatory feature. So, I guess if you pass the out-of-order flag during command queue creation, the implementation may ignore this flag if out-of-order is not supported by the platform.
Regards,