1. Concurrent kernel execution is already supported on AMD devices. This can be done by creating multiple command queues for the devices and queuing independent kernels across queues.
2. Dynamic parallelism is expected to be supported with OpenCL 2.0.
Thank you for the reply sudarshan.
So today on Hawaii I can create 64 queues (8 ACEs with 8 queues each) and have 64 different kernels running simultaneously?
Under Linux? With the standard Radeon driver or is a FirePro required? Is there any sample code which proves this?
The only reference I can find to OpenCL 2.0 support is for the OpenCL 1.2 beta driver, but the list doesn't include Dynamic Parallelism. Is there any ETA on this under Linux?
I have never tried creating so many command queues for a single device, so I am not sure about it. But if you would be doing it and have some insights to share, it would be of great help.
For concurrent kernel execution there is a sample code available in AMD APP SDK 2.9 in AMD_APP_SDK\2.9\samples\opencl\cpp_cl\ConcurrentKernel.
Drivers for OpenCL 2.0 are not yet released.
There is no document on Hawaii, but guide for 79xx says that it has two "Asynchronous Compute Engine/ Command Processor" that can process command queues concurrently.
So two kernels at a time, but you could ask OpenCL runtime for more, it will just dispatch them in some serial order. According to CodeXL 7850 also can execute two at a time (with Radeon driver on Win7 x64). Maybe two command processor for all GCN1.0.
Never checked if on Linux concurrent execution works, but code with multiple queues run correctly.
P.S i think i've heard somewhere that 7790 (which is GCN 1.1 maybe) can execute 3 at a time, but may be i'm wrong. May be it's also the case for Hawaii since it's something like GCN 1.1 too.