CUDA Streams is the equivalent of OpenCL Command Queue.
CUDA can support multiple Streams within a "CUDA Context".
However, I believe a CUDA context is always tied to a single CUDA device. (not sure if something has changed now).
CUDA's Concurrent memory copy (PCIe) + Kernel Executon, Concurrrent Kernel execution are all supported by Streams. The concurrent operations must belong to same CUDA context but coming from different streams.