One of the popular features of CUDA is that it has both D3D and OpenGL interoperability. Now, I myself have found transfers between the system and graphics memory to be quite fast, but I have also realized that any heavy duty calculating combined with copies back and forth from gfx memory just to have the end result in gfx memory anyhow, will produce a significant performance hit.
One of my last standing qualms about the API is the lack of OpenGL interoperability. Although I've always seen the mention of the "OpenGL interaction extension"... But I have no idea what it is or what it does, let alone how I would use it.
Nonetheless, I feel it is a crucial feature that has been overlooked for too long. When a programmer decides upon hardware, well, the last thing we want is for them to convert to NVIDIA, because their solution requires linux, and real-time rendering capabilities.
I'm sure OpenCL is likely taking a lot of focus away from the StreamSDK, however, since interoperability is available within the specification, would it be at all possible to see a porting of that code to make it into CAL before the OpenCL release?
Otherwise, at this point I am very excited about OpenCL coming to fruition, and due to my inability to focus on one project at a time, and the limited time I have to devote to any, I will be waiting until there is a solution to the memory boomarang scenario.