Archives Discussions

papercompute · ‎05-08-2015

Hello everyone!

My code works unstable, what is wrong with it:

for(int i=0;i<numIter;i++){

//

status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

ASSERT_CL(status);

if(i%16==0){

status = clFlush(commandQueue);

ASSERT_CL(status);

spinForEventsComplete( 1, &ndrEvt );

//status = clWaitForEvents(1, &ndrEvt);

//ASSERT_CL(status);

}

status = clFlush(commandQueue);

ASSERT_CL(status);

spinForEventsComplete( 1, &ndrEvt );

Thank you!

papercompute · ‎05-12-2015

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

cl_event ndrEvt = 0;

for(int i=0;i<numIter-1;i++){ // numIter = 1000

status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

if(i>0 && i%256==0){ // just

status = clFlush(commandQueue);

ASSERT_CL(status);

} // if

} // for i

status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

ASSERT_CL(status);

status = clFlush(commandQueue);

ASSERT_CL(status);

spinForEventsComplete( 1, &ndrEvt );

View solution in original post

dipak · ‎05-11-2015

Your waiting condition for event is not clear to me. You've launched multiple commands (total 16 x 2 = 32) before a clFlush() call and waited only for the last one. You've overwrite the event object "ndrEvt" in each iteration as below. Any reason/assumption?

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

papercompute · ‎05-11-2015

This code from ImageBandwidth.cpp SDK Example:

cl_event ev = 0;

for(int i=0; i < nKLaunches; i++)

{

ret = clEnqueueNDRangeKernel( queue,

kernel,

2,

global_work_offset,

global_work_size,

local_work_size,

0, NULL, &ev );

ASSERT_CL_RETURN( ret );

}

clFlush( queue );

spinForEventsComplete( 1, &ev );

nou · ‎05-11-2015

I think this create memory leak as you have reference only to last event.

dipak · ‎05-12-2015

If you want to enqueue all the kernel commands into the same queue, you can even avoid using any event object at all. The reason is, host queues are in-order by default. You don't need any explicit synchronization. For example:

for(int i=0; i < nKLaunches; i++)

{

ret = clEnqueueNDRangeKernel( queue, kernel_1, ..., 0, NULL, NULL );

ASSERT_CL_RETURN( ret );

ret = clEnqueueNDRangeKernel( queue, kernel_2, ..., 0, NULL, NULL );

ASSERT_CL_RETURN( ret );

. . . // other clEnqueue<>() calls

}

clFinish ( queue ); --> Blocks until all previously queued commands have finished

BTW, thanks for pointing to that sample code. I agree with nou that it can create a memory leak when nKLaunches > 1 [by default nKLaunches = 1]. I've passed the point to the concerned team.

Regards,

papercompute · ‎05-12-2015

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

cl_event ndrEvt = 0;

for(int i=0;i<numIter-1;i++){ // numIter = 1000

status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

if(i>0 && i%256==0){ // just

status = clFlush(commandQueue);

ASSERT_CL(status);

} // if

} // for i

status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

ASSERT_CL(status);

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

ASSERT_CL(status);

status = clFlush(commandQueue);

ASSERT_CL(status);

spinForEventsComplete( 1, &ndrEvt );

Archives Discussions

What is right order to queue multiple kernels in loop