i just read about pipes introduced with OpenCL 2.0.
I see a lot of use cases for this, like streaming data though several kernels running in parallel.
But i wonder whether there is a chance that this is going to work with good performance in current hardware ... or is it just a software abstraction.
I know that for example packet processing engines/CPUs spend a lot of silicon and extra features for this (e.g. to dequeue FIFO entries to several processing elements and enqueue the results in the correct order again).
Any comments from AMD?