OpenCL Concurrent Kernel Execution

There have been several posts on the subject of OpenCL support for Concurrent Kernel Execution (CKE) on ATI Cards. The consensus seems to be that the Radeon HD 5xxx Hardware supports it but it is not yet supported in the OpenCL driver.

Is there any indication of when or if this will be fixed in the drivers? ATI Streak SDK 2.3??

I only ask because, aside from this increasing the speed of normal SIMD kernels, without CKE, task parallel computation [queue.enqueueTask() - kernels with a workgroup of size 1] have ZERO performance improvement, by running on a system with a single OpenCL device. Since no 2 tasks can run in parallel, and must be run 1 after the other, even if the OpenCL device has more than enough resources to run both kernels.

Additionally, as far as I'm concerned, the ONLY benefit of NVidia over ATI is CKE. ATI consistently has better/faster hardware, and especially considering the fact that the hardware already supports CKE, it's a no brainer to implement.

I need CKE!!