Is there any rule of thumb about the number of command queues for each gpu device ?
Is that the standard that each device have only command queue ? Can it be more , and in which context ?
You can create any number of command-queues to the same device in the same context.
Similarly you can create as many contexts as you want - with each context having multiple queues to same device.
In some platforms, Multiple CQs in the same device will help to overlap kernel execution with PCIe memory transfers.
Recently, AMD has uploaded a small sample showing how you can achieve this.
Check the link below:
Re: Asynchronous DMA + Kernel Execution using AMD GPUs
Thanks for the link to the example.
I have found in the example that they have created multiple command queues on a device and with one context. So i get an example for the first problem domain where i can create any number of command queues to the same device in the same context.
Can you give any example to reference to the case study where you can create many contexts - with each context having multiple queues to the same device ?
Technically speaking -- Creating 2 contexts on same device makes "memory sharing " difficult. Since "cl_mem" objects are tied to the context, transferring data between contexts requires multiple copies -- which is a tremendous waste of time.
So, if you have multiple contexts -- then they should be doing completely independent things that dont need communication between them.
A simple example is a a scenario when your application is linked to multiple libraries which provide for OpenCL acceleration. Possibly one library is using OpenCL acceleration to image-processing and the other library is doing some speech recognition using OpenCL -- Just a wierd example. There could be better ones. Perhaps other Devgurus in this forum could unleash ther imagination...
Multiple Contexts are more likely to be seen when OpenCL compatible devices from different vendors are present in a single system. For example a machine with AMD GPU + NVIDIA GPU would require two contexts, if you want to divide the work among these GPUs.
Retrieving data ...