EDIT: Solved this by increasing problem size significantly
I am trying to run kernels on CPU and GPU concurrently, but I am having problems getting this to happen. As you can see in the image below, both kernels are getting queued up at approximately the same time (where the line is) into separate command queues.
However the GPU always executes when the CPU has done its execution. It looks as though the GPU never starts executing straight away, there is always a delay. Is there a way around this?
I should also add that the profiling was done using sprofile on Linux APPSDK v2.4