cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

papercompute
Adept I

What is right order to queue multiple kernels in loop

Jump to solution

Hello everyone!

My code works unstable, what is wrong with it:

    for(int i=0;i<numIter;i++){

    //

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    if(i%16==0){

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

    //status = clWaitForEvents(1, &ndrEvt);

    //ASSERT_CL(status);

    }

    }

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

Thank you!

0 Likes
1 Solution

Accepted Solutions
papercompute
Adept I

Re: What is right order to queue multiple kernels in loop

Jump to solution

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

    cl_event ndrEvt = 0;

    for(int i=0;i<numIter-1;i++){ // numIter = 1000

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    if(i>0 && i%256==0){ // just

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    } // if

    } // for i

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

View solution in original post

0 Likes
5 Replies
dipak
Staff
Staff

Re: What is right order to queue multiple kernels in loop

Jump to solution

Your waiting condition for event is not clear to me. You've launched multiple commands (total 16 x 2 = 32) before a clFlush() call and waited only for the last one. You've overwrite the event object "ndrEvt" in each iteration as below. Any reason/assumption?

status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);
0 Likes
papercompute
Adept I

Re: What is right order to queue multiple kernels in loop

Jump to solution

This code from ImageBandwidth.cpp SDK Example:

     cl_event     ev = 0;

  

     for(int i=0; i < nKLaunches; i++)

     {

        ret = clEnqueueNDRangeKernel( queue,

                                      kernel,

                                      2,

                                      global_work_offset,

                                      global_work_size,

                                      local_work_size,

                                      0, NULL, &ev );

        ASSERT_CL_RETURN( ret );

     }

     clFlush( queue );

     spinForEventsComplete( 1, &ev );

0 Likes
nou
Exemplar

Re: What is right order to queue multiple kernels in loop

Jump to solution

I think this create memory leak as you have reference only to last event.

0 Likes
dipak
Staff
Staff

Re: What is right order to queue multiple kernels in loop

Jump to solution

If you want to enqueue all the kernel commands into the same queue, you can even avoid using any event object at all. The reason is, host queues are in-order by default. You don't need any explicit synchronization. For example:

     for(int i=0; i < nKLaunches; i++)

     {

        ret = clEnqueueNDRangeKernel( queue, kernel_1, ..., 0, NULL, NULL );

        ASSERT_CL_RETURN( ret );

        ret = clEnqueueNDRangeKernel( queue, kernel_2, ..., 0, NULL, NULL );

        ASSERT_CL_RETURN( ret );

         . . . // other clEnqueue<>() calls

     }

     clFinish ( queue ); --> Blocks until all previously queued commands have finished

BTW, thanks for pointing to that sample code. I agree with nou that it can create a memory leak when nKLaunches > 1 [by default nKLaunches = 1]. I've passed the point to the concerned team.

Regards,

0 Likes
papercompute
Adept I

Re: What is right order to queue multiple kernels in loop

Jump to solution

Thanks all to advice/answers, I got stable code, but problem was outside.

Current version is:

    cl_event ndrEvt = 0;

    for(int i=0;i<numIter-1;i++){ // numIter = 1000

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    if(i>0 && i%256==0){ // just

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    } // if

    } // for i

    status = clEnqueueNDRangeKernel(commandQueue,kernel1,2,NULL,globalThreads,localThreads,0,NULL,NULL);

    ASSERT_CL(status);

    status = clEnqueueNDRangeKernel(commandQueue,kernel2,2,NULL,globalThreads,localThreads,0,NULL,&ndrEvt);

    ASSERT_CL(status);

    status = clFlush(commandQueue);

    ASSERT_CL(status);

    spinForEventsComplete( 1, &ndrEvt );

0 Likes