cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

alazaro
Adept II

Maximum Number of Commands Queues in AMD GPU?

Hello everyone,

I have a problem with an OpenCL code. This code creates 128 commands queues but it fails when the commands queues are created. The code works fine if the number of commands queues is less than or equal to 69. This program is executed in a AMD GPU (Hawaii). This GPU has AMD’s Graphics Core Next (GCN) architecture. According to this document: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Asynchronous-Shaders-White-Paper-FINA...  GCN supports up to 8 ACEs per GPU, and each ACE can manage up to 8 independent queues. Therefore, 8 ACEs x 8 queues = 64 queues, I don´t know if this can be related with my problem or there could be other explication. What do you think?. Thank you so much.

0 Likes
20 Replies
nibal
Challenger

Hi alazaro,

What is the error code returned by clCreateCommandQueue?

The Optimization guide is very clear on this. If Q # exceeds # of available hardware queues, it reassigns them to existing queues in a round-robin fashion.

Hi nibal, the error code returned by clCreateCommandQueue is -6. I think the -6 error code is CL_OUT_OF_HOST_MEMORY.

0 Likes

Yeap. That's error -6. If using *nix, confirm with top, and if so, go and buy some more.

I´ve confirmed with top and the host memory is used almost nothing when the command queues are created.

0 Likes

Hi,

clCreateCommandQueue creates the host-side queues and they are managed by the runtime. Runtime handles all the resource allocation needed for maintaining the host-side queues. As nibal  pointed out, these host-side queues (or software queues) are then mapped to hardware compute queues in some round-robin fashion. So, many such host-side queues (from different contexts as well) can be mapped to a single hardware compute queue.

As I guess, the above problem may be due to unavailability of system resources that are needed by the runtime to create a host-side queue. In that case, the number may vary depending on the system configuration. If possible, please try to check it with a different system configuration and observe the outcome.

Regards,

Hi dipak. I have tested my code in other different system and in other different GPU and the code fails in the same number of commands queues. Is there some environment (or driver) variable or something like that which could be influencing?.  On the other hand, how I could change the configuration of my system to test it better?. Thank you.

0 Likes

Thanks for your reply. It's interesting that the number is fixed irrespective of system configuration and GPU devices. Could you please check this scenario? - create multiple contexts for that device and then create the command queues for each context. Just want to verify whether the limitation has anything to do with the context or not.

Is there some environment (or driver) variable or something like that which could be influencing?

I'm not aware of any. I'll check.

0 Likes

Not sure, till please try to modify this parameter: GPU_MAX_COMMAND_QUEUES

[ To see all the parameters: strings /usr/lib/libamdocl64.so | grep GPU]

0 Likes

Hi,

I´ve checked a scenario where 2 contexts are created. For each context, I´ve created 64 commands queues. 64 is less than 69 for each context, but the total sum of both contexts is higher than 69. The program continues to crash when 69 commands queues are created overall. Addiotinally, I've changed the parameter GPU_MAX_COMMAND_QUEUES with the command: export GPU_MAX_COMMAND_QUEUES=128 , but I haven't had success.

0 Likes

Yes, as I checked on multiple GPUs, the number of max. command queues seems constant. Also,  indeed there is no effect of .GPU_MAX_COMMAND_QUEUES.

I guess, someone from runtime team may provide better insights in this regard. I'll check and get back to you.

Sorry for this delayed reply.

As I've come to know that indeed the number of max. host-side command queues is a hard limit and it is currently set to 70 per-process (69 with a zero based counting). So, the above observation was reasonable.

Regards,

dipak wrote:

Sorry for this delayed reply.

As I've come to know that indeed the number of max. host-side command queues is a hard limit and it is currently set to 70 per-process (69 with a zero based counting). So, the above observation was reasonable.

Regards,

Hmm. So the error code CL_OUT_OF_RESOURCES is correct in this case. We assumed wrongly it was memory, when in fact it was number of queues 😞

Hi dipak !! thank you for your help as well. :-).

0 Likes

Hi dipak !! Don´t worry. Thank you so much for your reply.

0 Likes

Hi dipak, excuse me for the delay of this question. Do you know any document where the statement about the limit of command queues per-process is reported?. Your statement is truth but now I am writting an article and I have to write a reference for this.

0 Likes

Sorry, I'm not aware of any such. I came to know that information from relevant team.

AFAIK, the limit is there because the OCL conformance test expects a predictable amount of queues. The limit is implementation dependent and may vary in future. So, I've doubt that there is any such document publicly available.

Regards,

Ok, Thank you so much.

0 Likes

Interesting.

Just to point out that CL_OUT_OF_RESOURCES (-5) is different than CL_OUT_OF_HOST_MEMORY (-6).

Error indicates some kind of host memory problem. Maybe you should check also syslog for any kind of memory issues.

0 Likes

Hi nibal, I´ve checked syslog but I haven´t seen anything strange.

0 Likes

I wrote this b4 you tested it on a different system. Was looking for a hardware problem, but the chance of the same thing happening in 2 different system is none. Plz ignore.

0 Likes