With yesterday's announcement by AMD about Mantle and the performance gains in this low level API, I began wondering about OpenCL.
If you are not familiar with Mantle, Anandtech has a pretty good summary (AnandTech Portal | Understanding AMD’s Mantle: A Low-Level Graphics API For GCN). What I got out of this was because of the significant overhead of writing to a generic device in Direct X or OpenGL, the performance inherently suffers. Coding directly to the new AMD Hawaii GPU (5 TFLOPS, btw) with the Mantle API developers can achieve 9x performance in draw requests. My question is if anyone has an idea what kind of overhead OpenCL introduces and what, if anything, we can do to get around it. If I could get 9x or even 3x performance improvements by coding to a specific device (E.g. a high end Firepro) I would be more than happy to do that for my most performance intensive subroutines.