Hello, All

I have several questions regarding "best practices" use of events.

Let's consider next workflow

kernel A1 -> kernel A2 - kernel A2 depends on result of kernelA1

kernel B1 -> kernel B2 - kernel B2 depends on result of kernelB1

kernels As and Bs are not connected in any way

And this code example:

for(i=0...N) { setparams(A1, A1params, i); setparams(A2, A2params, i); runkernel(A1, i) runkernel(A2, i) setparams(B1, B1params, i); setparams(B2, B2params, i); runkernel(B1, i) runkernel(B2, i) } clFinish(queue); dump_results(A2buf, B2buf);

I've added "i" as a parameter to setparams() and runkernel() to tell that in general case kernel parameters and kernel configuration may be different depending on the interaion number.

Now I have several assumtions (please tell me if I am wrong):

Assuming *In-Order* queue:

1) We do not need to use events at all (no sense) . In terms that all kernels wil be executed sequentially in strictly same order as they were added into the queue. Even if we would add dependency of type A1[i] -> B1[i] -> A1[i+1] -> ... (same for A2,B2) using events no kernels would be executed in parallel even if it was possible (A1,B1 are completely independent from A2,B2).

2) Instead of clFinish() I could add one event to the last enqued kernel (B2 in Nth iter) and use:

clFlush(queue); clWaitForEvents(B2_Nth_iter_event,1)

to acheive the same effect.

Assuming *Out-Of-Order* queue:

1) I would definitely need to use events to explicitly express dependency between A1->B1 and A2->B2. but anyway all kernels would be executed in a sequential way even though without ordering guarranty as soon as dependency specified by events will be satisfied.

2) Number of events will be proportional to the number of iterations because of A[i]->B[i]->A[i+1]->B[i+1] ... -> A[i+k] -> B[i|+k] dependency

3) Unlike for In-Order queue if I substitute clFinish() for clFlush() I would need to wait for 2 last events: one for A1->B1 chain and one for A2->B2 chain.

Also one question about reusing events (imagine same scenario but a little bit transposed).

for(i=0...N) { setparams(A1, A1params, i); setparams(B1, B1params, i); runkernel(A1, i) runkernel(B1, i) } for(i=0...N) { setparams(A2, A2params, i); setparams(B2, B2params, i); runkernel(A2, i) runkernel(B2, i) }

Suppose I want to proceed with events. So have 2*N events for the first cycle + 2*N for second cycle.

Now suppose i want to reuse events from first cycle.

Is it necessary to clReleaseEvent them before reusing them in second cycle? ( I assume YES :-) )

In other words when passing cl_event to clEnqueueNDRagngeKernel(..., &ev); ev should NOT be already initialized event right?

Also, is it much overhead if I have e.g. N=15, so about 30 events?

Thanks in advance.

executions even on the same queue if it's possible. The condition is the kernels followed each other don't utilize all compute units (CU) and don't have dependency.