cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

papercompute
Adept I

What is right order to queue multiple kernels in loop

Hello everyone!

My code works unstable, what is wrong with it:

    for(int i=0;i<numIter;i++){

    //

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    if(i%16==0){

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

    //status = clWaitForEvents(1, &ndrEvt);

    //ASSERT_CL(status);

    }

    }

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

Thank you!

0 Likes
1 Solution
papercompute
Adept I

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

    cl_event ndrEvt = 0;

    for(int i=0;i<numIter-1;i++){ // numIter = 1000

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    if(i>0 && i%256==0){ // just

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    } // if

    } // for i

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

View solution in original post

0 Likes
5 Replies
dipak
Big Boss

Your waiting condition for event is not clear to me. You've launched multiple commands (total 16 x 2 = 32) before a clFlush() call and waited only for the last one. You've overwrite the event object "ndrEvt" in each iteration as below. Any reason/assumption?

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
0 Likes

This code from ImageBandwidth.cpp SDK Example:

     cl_event     ev = 0;

  

     for(int i=0; i < nKLaunches; i++)

     {

        ret = clEnqueueNDRangeKernel( queue,

                                      kernel,

                                      2,

                                      global_work_offset,

                                      global_work_size,

                                      local_work_size,

                                      0, NULL, &ev );

        ASSERT_CL_RETURN( ret );

     }

     clFlush( queue );

     spinForEventsComplete( 1, &ev );

0 Likes

I think this create memory leak as you have reference only to last event.

0 Likes

If you want to enqueue all the kernel commands into the same queue, you can even avoid using any event object at all. The reason is, host queues are in-order by default. You don't need any explicit synchronization. For example:

     for(int i=0; i < nKLaunches; i++)

     {

        ret = clEnqueueNDRangeKernel( queue, kernel_1, ..., 0, NULL, NULL );

        ASSERT_CL_RETURN( ret );

        ret = clEnqueueNDRangeKernel( queue, kernel_2, ..., 0, NULL, NULL );

        ASSERT_CL_RETURN( ret );

         . . . // other clEnqueue<>() calls

     }

     clFinish ( queue ); --> Blocks until all previously queued commands have finished

BTW, thanks for pointing to that sample code. I agree with nou that it can create a memory leak when nKLaunches > 1 [by default nKLaunches = 1]. I've passed the point to the concerned team.

Regards,

0 Likes
papercompute
Adept I

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

    cl_event ndrEvt = 0;

    for(int i=0;i<numIter-1;i++){ // numIter = 1000

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    if(i>0 && i%256==0){ // just

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    } // if

    } // for i

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

0 Likes