I've looked at some comparisons betwen C++ AMP and OpenCL, and OpenCL is what I want to develop for, for many reasons. Mainly it being cross-platform and not as hardware specific as AMP or CUDA. It also well written code seems to perform better than it does on AMP, while CUDA is a non-starter being Nvidia only.
Anyway, my question would be how to properly develop/release OpenCL applications. I noticed there are Intel, AMD, ARM, and Nvidia SDKs. For now, I'm mostly concerned with developing for AMD and Intel APUs, I'm not much interested at all in GPUs. Mostly because the type of programs I would write would require very good shrared memory performance, and APUs have it even if they don't have as fast or as many cores. I believe the APU will be the true successor to the math co-processor that we have had since the days of the 386, what we call an FPU today.
Anyway, my concern is, what is the difference between the AMD and Intel OpenCL SDKs? Will the same OpenCL code (minus specific optimizations) run on both SDKs? I'm planning to compile my OpenCL applications as DLLs seperately, and call them from a VC++ applicatoin after profiling the target's system. The idea would be to use a binary produced by the AMD SDK for an AMD CPU/APU and an the Intel SDK for an Intel CPU/APU. The two "projects" would share much of the same source-code, while having differences in their headers / kernel classes. I'm just wanting to confirm that this approach would be reasonable or if it has been documented.
I would also be very much interested in any books/literature on this topic. If anyone has anything else I should be aware of please, let me know. Thank you.
there is ICD mechanism in opencl.dll which load vendor specific dll. you link against opencl.dll which are same for all vendors. also you don't need two binary specific for vendor. you can decide on runtime to use some specific optimizations and have single binary. or you mean kernel binaries? which are indeed vendor or even device specific.
the ICD extension provided by khronos group defines an installable client driver (ICD) that acts as an interface to various vendor specific OpenCL implementations. Using this extension, an OpenCl function call may be directed to a vendor specific implementation. More details of this are found at http://www.khronos.org/registry/cl/extensions/khr/cl_khr_icd.txt.
This extension could be used for cross platform compilation, build and distribution of openCL code.
very nice info, and bleakwise that is the same thing i believe will be the future also let me share my experience so far .
some months ago i was toying with the same idea, i was trying to make an application that called little opencl kernels like, sort this thing, or find this thing , to increase performance .
I only have a desktop with discrete gpu, but even when running on the gpu was many times faster the latency of copyng data betwen cpu and gpu killed any gains so running the code on the cpu was faster
and now im waiting to buy an apu to continue working whit that.
so my only recomendation is profile the thing and if you can avoid any copies of data the better i think you can already do this on amd, im not sure if you could do it on intel. but anyway good luck and happy coding