OpenCL API execution time is too long

Question asked by obara on May 15, 2014
My system frequently(10000 or more) executes the following OpenCL API.


clEnqueueTask(command_queue, kernel, 0, NULL,&event);  (1us)

clWaitForEvents(1, &event);                          (100us)


__kernel void add(__global int* A, __global float* B, __global float* C)


  *C = *A + *B;



But there is fatal defect that

The OpenCL API execution time is too long.

For example,

clEnqueueTask API takes 1us/1call,

the following clWaitForEvents API takes 100us/1call.



How can I manage the API execution time.