As mentioned in the document, a hardware queue can be thought of as a GPU entry point where commands or tasks are submitted to a GPU. There are mainly three types of queue available - compute queue, graphics queue and copy queue. The graphics command processor handles graphics queues, the asynchronous compute engines (ACEs) handle compute queues, and the DMA engines handle copy queues. Each queue can dispatch work items without waiting for other tasks to complete, allowing independent command streams to be interleaved on the GPU’s Shader Engines and execute simultaneously. For more information, please refer section "HARDWARE DESIGN" in Asynchronous-Shaders-White-Paper-FINAL.pdf .
When an OpenCL queue is created, it is assigned to a hardware queue which is mostly a compute queue. The compute queue is selected according to the creation order of the OpenCL queue within an OpenCL context. The GPU_NUM_COMPUTE_RINGS environment variable can be used to limit the number of compute queues available for the selection.
Thanks.