There is no concept of command queue in cuda and in opencl it is one of the core components.
How does it differ between with each other ?
CUDA Streams is the equivalent of OpenCL Command Queue.
CUDA can support multiple Streams within a "CUDA Context".
However, I believe a CUDA context is always tied to a single CUDA device. (not sure if something has changed now).
CUDA's Concurrent memory copy (PCIe) + Kernel Executon, Concurrrent Kernel execution are all supported by Streams. The concurrent operations must belong to same CUDA context but coming from different streams.
According to the "Programming massively parallel processor" CUDA streams is mainly for task parallelism which also play an important role in achieving performance goals along with the data parallelism concepts.
Can we say that the concept of command queue in OpenCL is largely conceived to support task parallelism ?
By task parallelism -- what they mean is --> Concurerntly executing 2 kernels, Concurrently performing a PCIe transfer along with Kernel execution... etc.. This parallelism is at a very coarse level (i.e. higher level) when compared to data-parallelism (which is at a very fine level).
Command queue can be thought of as different work channels to the device. You can look at it as "task parallelism" - if that helps you.
Retrieving data ...