How to get OpenCL to start calculations without blocking (flush and finish)?

Discussion created by dravisher on Mar 22, 2010
Latest reply on Mar 26, 2010 by gaurav.garg
Both flush and finish seem to block, and without them OpenCL is lazy

Hi. I am writing a quantum monte carlo algorithm in OpenCL, and part of it is using blocking for statistical analysis. Basically my kernels are each returning a bunch of values, and I need to write these values to disk as I go. My current method is outlined in the code below (the full code is ~700 lines, and I don't think it has any bearing on what is going on here).

What I'm trying to do is write the previous result to file while the next result is being calculated on the OpenCL device. However it seems like OpenCL does not start the calculations before either queue.flush() or queue.finish() is called. Initially I thought that queue.flush() would prod OpenCL into action, and then return immediately, but that does not seem to be the case. It seems like flush() is blocking, or at least so close to blocking as makes no difference.

Does anyone have a suggestion as to how I might achieve what I'm trying to do here? Right now I am timing the IO part and the finish() part, in addition to timing the whole codeblock, and my results show that the time used for finish() does not change if I move it above the file IO part. The file IO part takes almost as long as the calculation part, so there should have been a noticable difference in execution time if OpenCL was calculating in the background before finish() was called on the queue.

Adding queue.flush() between the queue start and the file IO part is basically equivalent to putting queue.finish() in the same place as far as execution times are concerned.

Timer2.Start(); err = queue.enqueueNDRangeKernel(MC_Kernel, cl::NullRange, cl::NDRange(StandardSimulations), cl::NDRange(StandardWorkgroupSize), NULL, NULL); checkErr(err, "CommandQueue::enqueueNDRangeKernel()"); //Write previous run to file while OpenCL device is calculating next run Timer.Start(); if(i != 0) Write_Blocking_File(BlockingDataTemp, NumAcceptedLocal * StandardMCCycles, i - 1); IOTime += Timer.End(); Timer.Start(); err = queue.finish(); checkErr(err, "queue.finish()"); if(i != 0){ WaitForCalcTime += Timer.End(); IOplusCalc += Timer2.End(); }