I have a problem with an OpenCL code. This code creates 128 commands queues but it fails when the commands queues are created. The code works fine if the number of commands queues is less than or equal to 69. This program is executed in a AMD GPU (Hawaii). This GPU has AMD’s Graphics Core Next (GCN) architecture. According to this document: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Asynchronous-Shaders-White-Paper-FINA... GCN supports up to 8 ACEs per GPU, and each ACE can manage up to 8 independent queues. Therefore, 8 ACEs x 8 queues = 64 queues, I don´t know if this can be related with my problem or there could be other explication. What do you think?. Thank you so much.
What is the error code returned by clCreateCommandQueue?
The Optimization guide is very clear on this. If Q # exceeds # of available hardware queues, it reassigns them to existing queues in a round-robin fashion.
clCreateCommandQueue creates the host-side queues and they are managed by the runtime. Runtime handles all the resource allocation needed for maintaining the host-side queues. As nibal pointed out, these host-side queues (or software queues) are then mapped to hardware compute queues in some round-robin fashion. So, many such host-side queues (from different contexts as well) can be mapped to a single hardware compute queue.
As I guess, the above problem may be due to unavailability of system resources that are needed by the runtime to create a host-side queue. In that case, the number may vary depending on the system configuration. If possible, please try to check it with a different system configuration and observe the outcome.
Hi dipak. I have tested my code in other different system and in other different GPU and the code fails in the same number of commands queues. Is there some environment (or driver) variable or something like that which could be influencing?. On the other hand, how I could change the configuration of my system to test it better?. Thank you.
Just to point out that CL_OUT_OF_RESOURCES (-5) is different than CL_OUT_OF_HOST_MEMORY (-6).
Error indicates some kind of host memory problem. Maybe you should check also syslog for any kind of memory issues.
Thanks for your reply. It's interesting that the number is fixed irrespective of system configuration and GPU devices. Could you please check this scenario? - create multiple contexts for that device and then create the command queues for each context. Just want to verify whether the limitation has anything to do with the context or not.
Is there some environment (or driver) variable or something like that which could be influencing?
I'm not aware of any. I'll check.
Not sure, till please try to modify this parameter: GPU_MAX_COMMAND_QUEUES
[ To see all the parameters: strings /usr/lib/libamdocl64.so | grep GPU]