Still more wishes:
*provide some FSAIL backend to LLVM now that PTX LLVM support seems mature and OpenCL in Clang is evolving fast so people can work on developing high level tools that target FSAIL..
*Fix finally multigpu support on all OSes..
I hope also FSA brings this over CAL:
*New in CUDA 4.0 is copy mem between "contexts" (cudamemcpyPeer) even for 1 device.. just add this to completeness to FSA API as CAL doesn't support it.. also allows/exposes a "fast copy path" between devices in a simple call..
*Expose/document all features in FSA as OCL exposes.. for example seems OCL GL interop is via some CAL GL interop but that support isn't publicily exposed..
*I hope also DX interop is exposed from version 9 to 11 similar to CUDA a CAL has only DX9 and DX10 interop..
*New in CUDA 4.0 is creation of "writable texture" (in CUDA called surface) from OpenGL or DX textures using a flag in mapping the texture resource so this in AMD parlance would be allow mapping a graphics tex to UAV interop.. This allows compute to "graphics API" texture directly..