I have a kernel, which works 1.7 ms when I create buffers in default way and mem flags are CL_MEM_READ/WRITE_ONLY and call write/read buffer functions and all the function works for 5.3 ms.
If I create input buffer using flag CL_MEM_USE_HOST_PTR and passing it to args, but output buffer still creates in this "pure way" and function readbuffer is being called it takes 2.3 ms for kernel, and 4.3 for all the function.
In last case I try to do both buffers using host ptr, and in this case kernel works for 4.1 ms, and all the function works for 4.5 ms
I definetly don't understand what's going on and can anyone explain me how it works? Am I right that this APU has generic address space for GPU and CPU, so why kernel's time execution increasing so much?