Profiling time for blocking and non-blocking execution

Question asked by lenjyco on Jul 19, 2013

Hi all,


I'm new to OpenCL dev and i want to understand some mechanics.


I've a simple matrix multiplication kernel, and i want to see the impact of the blocking option for the clEnqueue* instructions.

So, i compute one time with blocking write&read and the other non blocking.


When I look to the profiling times of execution, I've, for the blocking version, a sequential order for each time (enqueue, submit, kernel start, kernel end) but in non blocking i got the execution of the kernel before that it's submitted and queued.


Can someone explain me this behaviour, thank you very much.