It has been mentioned a few times that the 58xx can handle concurrent kernels in the hardware; however, the SDK does not support this.
Are we going to see support for this at all?
If so, are we going to see support for this across contexts? (Nvidia's Fermi supports concurrent kernels but only within same context)
Originally posted by: Jawed http://forum.beyond3d.com/showthread.php?p=1447397#post1447397
BTW, have you come across, or seen, any instances where an app got a good speedup due to concurrent kernel execution via Fermi?
I'm curious since most multi-kernel applications have data dependency and I would imagine that the data dependency would prevent concurrency!?
I've not seen any results of experiments.
Then again, I've not seen any real results from Fermi so far.
Data dependency doesn't obviate concurrency. The basis of producer-consumer algorithms is some kind of inter-kernel buffering, usually a queue, but also a pool. Real time graphics is a producer-consumer algorithm: a software pipeline with vertex functions feeding into pixel functions, using buffers between each stage.
GRAMPS is a nice example, too:
The concept of append and consume buffers in DirectCompute 5, part of D3D11, naturally fits in here. I've no idea if this is a basis for accessing the multi-kernel capability in ATI - I've not spent any time looking at this stuff. Ultimately, mechanics are required to monitor queue fullness and load-balance. Gets hairy fairly quickly.
Seems a real pity to me that append/consume isn't part of OpenCL 1.1.