1 of 1 people found this helpful
HD5xxx series does not support different kernels to execute concurrently. HD7xxx series is the first that is able to do this, and even they can only do with 2 types of kernels. (Someone correct me if I'm wrong) The thread scheduler of Fermi cards are more advanced, they can handle a lot more tyes of kernels and fill in gaps of idle Compute Units with kernels.
FYI, HDxxxx cards all feature thread schedulers that can queue multiple types of kernels, but only one type can execute at any given time. (HD7xxx has two types of kernels) It is good to know, that if you dispatch kernels in the command queue, and concurrently flush OpenGL commands, both the compute and display kernels are dispatched onto the device, but there is no prioritizing of kernels (not even on HD7xxx), and compute kernels are always executed before display kernels.
Thank you for the useful answer!
I missed two points from this:
1. When you say that 2 types of kernels can be executed concurrently what do you mean. OpenGL and OpenCL kernels? Two different OpenCL kernels? Something else?
2. You said "there is no prioritizing of kernels (not even on HD7xxx), and compute kernels are always executed before display kernels". How should it be understood? Compute kernels are moved up in the command queue if there are any OpenGL commands before?
Is there any technical paper where I can find information on my issue?
Sorry if I wasn't clear enough.
1: By two types of kernels I mean any two kernels that don't share exactly the same kernel/shader code.
2: Inside the commandqueue, no. Commandqueue is a completely host side mechanism and there groups of enqueueNDRange will only take over each other, if CL_OUT_OF_ORDER_QUEUE is enabled on the commandqueue (currently only supported by Intel SDK). The commandqueue and the thread scheduler of the device are two different queues, so to say. You have full control over the prior, and practically none over the latter. If you look at the event timings associated with a kernel, SUBMIT - ENQUEUE is the time the kernel spent waiting in the commandqueue, START - SUBMIT is the time it spent on the pci bus AND inside the device thread scheduler, and naturally END - START is the time it spent executing.
About technical papers... I really don't know. All these infos I gathered from different news portals and technical reviews. The only things I'm not 100% sure about, is that HD7xxx can only handle two sets of kernels at once, since the thread scheduler was redesigned and called ACE (Asynchronous Compute Engine) if I'm not mistaken about the abbreviation. It was said that it will be able to feed idle Compute Units a lot better than earlier generations, but somewhere else they said something like that there are two ACEs, and they operate on two different parts of the GPU (16CU for one, and 16CU for the other). If they are truly asynchronous and can handle multiple types of kernels (group by code), then it should be completely transparent (or irrelevant from programming view) that there are two ACEs, and most likely it's only a neccessity due to the fact that one ACE cannot handle more than 16CUs efficiently. (Which is absolutely no problem if there can be multiple engines that share the same thread queue on the device.