Am running a modified templatec.cpp together with the APP profiler.
Calling my own version of runkernels which writes a small data area, runs a kernel and reads back 8000 bytes.
All the cl_mem addresses, data areas and arguments have been set up.
Running on an HD6850 with APP
Calling the runkernels code 1300 times gives me a writebuffer time of around 0.08 millisecond, a kernel time of around 0.3 milliseconds. and a read buffer time of around 0.16 milliseconds (for each iteration).
Yet 1300 calls takes 54.0 seconds !
Is it not possible to utilise OPENCL with quick kernels ?