Hello everyone!
My code works unstable, what is wrong with it:
for(int i=0;i<numIter;i++){
//
status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
ASSERT_CL(status);
if(i%16==0){
status = clFlush(commandQueue);
ASSERT_CL(status);
spinForEventsComplete( 1, &ndrEvt );
//status = clWaitForEvents(1, &ndrEvt);
//ASSERT_CL(status);
}
}
status = clFlush(commandQueue);
ASSERT_CL(status);
spinForEventsComplete( 1, &ndrEvt );
Thank you!
Solved! Go to Solution.
Thanks all to advice/answers, I got stable code, but problem was outside.
Current version is:
cl_event ndrEvt = 0;
for(int i=0;i<numIter-1;i++){ // numIter = 1000
status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
if(i>0 && i%256==0){ // just
status = clFlush(commandQueue);
ASSERT_CL(status);
} // if
} // for i
status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
ASSERT_CL(status);
status = clFlush(commandQueue);
ASSERT_CL(status);
spinForEventsComplete( 1, &ndrEvt );
Your waiting condition for event is not clear to me. You've launched multiple commands (total 16 x 2 = 32) before a clFlush() call and waited only for the last one. You've overwrite the event object "ndrEvt" in each iteration as below. Any reason/assumption?
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
This code from ImageBandwidth.cpp SDK Example:
cl_event ev = 0;
for(int i=0; i < nKLaunches; i++)
{
ret = clEnqueueNDRangeKernel( queue,
kernel,
2,
global_work_offset,
global_work_size,
local_work_size,
0, NULL, &ev );
ASSERT_CL_RETURN( ret );
}
clFlush( queue );
spinForEventsComplete( 1, &ev );
I think this create memory leak as you have reference only to last event.
If you want to enqueue all the kernel commands into the same queue, you can even avoid using any event object at all. The reason is, host queues are in-order by default. You don't need any explicit synchronization. For example:
for(int i=0; i < nKLaunches; i++)
{
ret = clEnqueueNDRangeKernel( queue, kernel_1, ..., 0, NULL, NULL );
ASSERT_CL_RETURN( ret );
ret = clEnqueueNDRangeKernel( queue, kernel_2, ..., 0, NULL, NULL );
ASSERT_CL_RETURN( ret );
. . . // other clEnqueue<>() calls
}
clFinish ( queue ); --> Blocks until all previously queued commands have finished
BTW, thanks for pointing to that sample code. I agree with nou that it can create a memory leak when nKLaunches > 1 [by default nKLaunches = 1]. I've passed the point to the concerned team.
Regards,
Thanks all to advice/answers, I got stable code, but problem was outside.
Current version is:
cl_event ndrEvt = 0;
for(int i=0;i<numIter-1;i++){ // numIter = 1000
status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
if(i>0 && i%256==0){ // just
status = clFlush(commandQueue);
ASSERT_CL(status);
} // if
} // for i
status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);
ASSERT_CL(status);
status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
ASSERT_CL(status);
status = clFlush(commandQueue);
ASSERT_CL(status);
spinForEventsComplete( 1, &ndrEvt );