I just tried using the AMD APP KernelAnalyzer with the OpenCL Complier Option "-x clc++", but I noticed something wasn't overloaded that I hoped. In OpenCL C as defined in "The OpenCL Programming Language 1.2. Rev15 Khronos 2011," it seems that vloadn and vstoren are defined through some overloaded functions somewhere in the OpenCL C Compiler, possibly something like the following:
template <typename gentype4, typename gentype> __kernel void vstore4(gentype4 data, size_t offset, const __global gentype *p); |
I'd like to suggest that the OpenCL C++ Compiler extend the overloading further to something like the following:
template <typename gentypen, typename gentype> __kernel void vstoren(gentypen data, size_t offset, const __global gentype *p); template __attribute__((mangled_name(vstore4))) __kernel void vstoren(gentype4 data, size_t offset, const __global gentype *p); |
At the moment I don't see something like that available.
Cheers!