>> Max on device queues : 1
This limit is for device-side queue. The maximum number of device queues that can be created per context. This value can be queried using clGetDeviceInfo with param CL_DEVICE_QUEUE_ON_ DEVICE_MAX_SIZE.
For concurrent command execution using multiple host-side queues, it mainly depends on hardware capabilities like no. of asynchronous compute engines (ACEs) and hardware queues available on the device. If you are using a recent AMD GPU, then I think it should have multiple ACEs and hardware queues.
As described in the section "Command Queue" in AMD_OpenCL_Programming_Optimization_Guide.pdf :
"A hardware queue can be thought of as a GPU entry point. The GPU can process kernels from several compute queues concurrently. All hardware queues ultimately share the same compute cores. The use of multiple hardware queues is beneficial when launching small kernels that do not fully saturate the GPU. "
"An OpenCL queue is assigned to a hardware queue on creation time. The hardware compute queues are selected according to the creation order within an OpenCL context. If the hardware supports K concurrent hardware queues, the Nth created OpenCL queue within a specific OpenCL context will be assigned to the (N mod K) hardware queue. The number of compute queues can be limited by specifying the GPU_NUM_COMPUTE_RINGS environment variable."
Thanks.