I just have run my opencl kernel on both windows 7 and kbuntu 12.10 that built in an autotool project. the project is developed under linux and then ported to windows using MingW64+Msys. My platform is samsung 535U3C laptop equipped with A6-4455M (Trinity APU, GPU part is HD7500G) . I profile my kernel time by Event with OpenCL API clGetEventProfilingInfo, how ever the same code result in huge performance difference, the kernel time measured under kubuntu is around 2 times faster than that measured under windows 7 + MingW64.
And What I want to do is eliminate the memory transfer between CPU and GPU on this integrated Chip, namely zero copy. unfortunately, this feature is only available under window according to the Table 4.2 of AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide4.pdf
so now I have an embarrassed situation faced, as even the copy time between CPU and GPU can be reduced to zero , I suffer a two times slower kernel under windows!!!
Any ideas why the kernel under windows will be so slow?