I have refined the code further to pin down to the smallest piece of code causing this
problem. There is no need of running Kernel too.
int global_var = 0;
void call_back(void * args)
global_var = 1;
clEnqueueReadBuffer(data_queue, CL_FALSE,....,0, 0, &read_event); // perform some non-blocking read
clSetEventCallBack(read_event,..., call_back, args); // set a call back once the read finishes.
begin_callback_time = rtclock();
while(!global_var); // spin until the global_var is set to 1
total_callback_time = rtclock() - begin_time;
The above code is enough to hit a timing of 20ms. (though it only takes 2ms for actual data transfer). so only if I try to synchronize the callback (with "while") I am seeing this issue.
If I dont try to synchronize, (I want to synchronize because I want to run this code in
a loop and I dont want to go to the next iteration before this iteration's data is copied out)
Any thoughts on what might be the reason?