cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Highlighted
Journeyman III
Journeyman III

Concurrent Kernels

It has been mentioned a few times that the 58xx can handle concurrent kernels in the hardware; however, the SDK does not support this.

Are we going to see support for this at all?

If so, are we going to see support for this across contexts? (Nvidia's Fermi supports concurrent kernels but only within same context)

0 Kudos
Reply
5 Replies
Highlighted
Journeyman III
Journeyman III

Concurrent Kernels

At present, there are no plans to support concurrent kernels.

0 Kudos
Reply
Highlighted
Journeyman III
Journeyman III

Concurrent Kernels

Thanks.

Though this seems odd considering your major competitor supports this feature now.

0 Kudos
Reply
Highlighted
Adept II
Adept II

Concurrent Kernels

0 Kudos
Reply
Highlighted
Journeyman III
Journeyman III

Concurrent Kernels

Originally posted by: Jawed http://forum.beyond3d.com/showthread.php?p=1447397#post1447397

Thanks Jawed.

BTW, have you come across, or seen, any instances where an app got a good speedup due to concurrent kernel execution via Fermi?

I'm curious since most multi-kernel applications have data dependency and I would imagine that the data dependency would prevent concurrency!?

0 Kudos
Reply
Highlighted
Adept II
Adept II

Concurrent Kernels

I've not seen any results of experiments.

Then again, I've not seen any real results from Fermi so far.

Data dependency doesn't obviate concurrency. The basis of producer-consumer algorithms is some kind of inter-kernel buffering, usually a queue, but also a pool. Real time graphics is a producer-consumer algorithm: a software pipeline with vertex functions feeding into pixel functions, using buffers between each stage.

GRAMPS is a nice example, too:

http://graphics.stanford.edu/papers/gramps-tog/gramps-tog09.pdf

The concept of append and consume buffers in DirectCompute 5, part of D3D11, naturally fits in here. I've no idea if this is a basis for accessing the multi-kernel capability in ATI - I've not spent any time looking at this stuff. Ultimately, mechanics are required to monitor queue fullness and load-balance. Gets hairy fairly quickly.

Seems a real pity to me that append/consume isn't part of OpenCL 1.1.

Jawed

0 Kudos
Reply