I understand that usage of clFinish() is not advisable for performance reasons.
I have a scenario whereby I have few kernels that execute in order ... the output of one kernel is the input to the next kernel ... It has been found that with one of the kernels (only ONE such kernel) ... unless I do a clFinish() immediately after enqueing the kernel for execution ... i do not get the correct result from the OpenCL app under investigation.
I create command queue with default settings - in-order execution. I am running Snow Leopard Mac OS and I have an ATI GPU. I do a blocking read and write from and to the GPU.
Why do I need to force the execution of one particular kernel? What might be going wrong when I do not do a clFinish()? Any suggestions for debugging?
Thanks for your help and suggestion and discussion.