AnsweredAssumed Answered

OpenCL 2.0 Device command queue keeps filling up and halting execution

Question asked by pmorgan4801 on Jun 6, 2020
Latest reply on Jun 6, 2020 by hardcoregames™

I am utilizing OpenCL’s enqueue_kernel() function to enqueue kernels dynamically from the GPU to reduce unnecessary host interactions. Here is a simplified example of what I am trying to do in the kernels:

kernel void kernelA(args)
{
   //This kernel is the one that is enqueued from the host, with only one work item. This kernel
   //could be considered the "master" kernel that controls the logic of when to enqueue tasks
   //First, it checks if a condition is met, then it enqueues kernelB

   if (some condition)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelB(args);});
   }
   else
   {
      //do other things
   }
}

kernel void kernelB(args)
{
   //Do some stuff

   //Only enqueue the next kernel with the first work item. I do this because the things
   //occurring in kernelC rely on the things that kernelB does, so it must take place after kernelB is completed,
   //hence, the CLK_ENQUEUE_FLAGS_WAIT_KERNEL
   if (get_global_id(0) == 0)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelC(args);});
   }
}

kernel void kernelC(args)
{
   //Do some stuff. This one in particular is one step in a sorting algorithm

   //This kernel will enqueue kernelD if a condition is met, otherwise it will
   //return to kernelA
   if (get_global_id(0) == 0 && other requirements)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelD(args);});
   }
   else if (get_global_id(0) == 0)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelA(args);});
   }
}

kernel void kernelD(args)
{

   //Do some stuff

   //Finally, if some condition is met, enqueue kernelC again. What this will do is it will
   //bounce back and forth between kernelC and kernelD until the condition is
   //no longer met. If it isn't met, go back to kernelA
   if (some condition)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelC(args);});
   }
   else
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelA(args);});
   }
}

So that is the general flow of the program, and it works perfectly and does exactly as I intended it to do, in the exact order I intended it to do it in, except for one issue. In certain cases when the workload is very high, a random one of the enqueue_kernel()s will fail to enqueue and halt the program. This happens because the device queue is full, and it cannot fit another task into it. But I cannot for the life of me figure out why this is, even after extensive research.

 

I thought that once a task in the queue (a kernel for instance) is finished, it would free up that spot in the queue. I am only enqueuing a single kernel at a time, so my queue should really only reach a max of like 1 or 2 tasks at a time. But this program will literally fill up the entire 262,144 byte size of the device command queue, and stop functioning. Is there some way to clear the queue or something?

 

I would greatly appreciate some potential insight as to why this is happening if anyone has any ideas. I am sort of stuck and cannot continue until I get past this issue.

 

Thank you in advance!

 

(BTW I am running on a Radeon RX 590 card, and am using the AMD APP SDK 3.0 to use with OpenCL 2.0)

Outcomes