5 Replies Latest reply on May 12, 2015 7:32 AM by papercompute

    What is right order to queue multiple kernels in loop

    papercompute

      Hello everyone!

      My code works unstable, what is wrong with it:

       

          for(int i=0;i<numIter;i++){

          //

          status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

          ASSERT_CL(status);

          status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

          ASSERT_CL(status);

          if(i%16==0){

          status = clFlush(commandQueue);

          ASSERT_CL(status);

          spinForEventsComplete( 1, &ndrEvt );

          //status = clWaitForEvents(1, &ndrEvt);

          //ASSERT_CL(status);

          }

          }

          status = clFlush(commandQueue);

          ASSERT_CL(status);

          spinForEventsComplete( 1, &ndrEvt );

       

      Thank you!

        • Re: What is right order to queue multiple kernels in loop
          dipak

          Your waiting condition for event is not clear to me. You've launched multiple commands (total 16 x 2 = 32) before a clFlush() call and waited only for the last one. You've overwrite the event object "ndrEvt" in each iteration as below. Any reason/assumption?

           

          status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
            • Re: What is right order to queue multiple kernels in loop
              papercompute

              This code from ImageBandwidth.cpp SDK Example:

                   cl_event     ev = 0;

                

                   for(int i=0; i < nKLaunches; i++)

                   {

                      ret = clEnqueueNDRangeKernel( queue,

                                                    kernel,

                                                    2,

                                                    global_work_offset,

                                                    global_work_size,

                                                    local_work_size,

                                                    0, NULL, &ev );

                      ASSERT_CL_RETURN( ret );

                   }

               

               

                   clFlush( queue );

                   spinForEventsComplete( 1, &ev );

                • Re: What is right order to queue multiple kernels in loop
                  nou

                  I think this create memory leak as you have reference only to last event.

                  • Re: What is right order to queue multiple kernels in loop
                    dipak

                    If you want to enqueue all the kernel commands into the same queue, you can even avoid using any event object at all. The reason is, host queues are in-order by default. You don't need any explicit synchronization. For example:

                     

                         for(int i=0; i < nKLaunches; i++)

                         {

                            ret = clEnqueueNDRangeKernel( queue, kernel_1, ..., 0, NULL, NULL );

                            ASSERT_CL_RETURN( ret );

                     

                            ret = clEnqueueNDRangeKernel( queue, kernel_2, ..., 0, NULL, NULL );

                            ASSERT_CL_RETURN( ret );

                     

                             . . . // other clEnqueue<>() calls

                         }

                         clFinish ( queue ); --> Blocks until all previously queued commands have finished

                     

                     

                    BTW, thanks for pointing to that sample code. I agree with nou that it can create a memory leak when nKLaunches > 1 [by default nKLaunches = 1]. I've passed the point to the concerned team.

                     

                    Regards,

                • Re: What is right order to queue multiple kernels in loop
                  papercompute

                  Thanks all to advice/answers, I got stable code, but problem was outside.

                  Current version is:

                   

                      cl_event ndrEvt = 0;

                      for(int i=0;i<numIter-1;i++){ // numIter = 1000

                      status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

                      ASSERT_CL(status);

                      status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

                      ASSERT_CL(status);

                      if(i>0 && i%256==0){ // just

                      status = clFlush(commandQueue);

                      ASSERT_CL(status);

                      } // if

                      } // for i

                      status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

                      ASSERT_CL(status);

                      status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

                      ASSERT_CL(status);

                      status = clFlush(commandQueue);

                      ASSERT_CL(status);

                      spinForEventsComplete( 1, &ndrEvt );