PC Drivers & Software

pmorgan4801 · ‎06-06-2020

I am utilizing OpenCL’s enqueue_kernel() function to enqueue kernels dynamically from the GPU to reduce unnecessary host interactions. Here is a simplified example of what I am trying to do in the kernels:

kernel void kernelA(args)
{
   //This kernel is the one that is enqueued from the host, with only one work item. This kernel
   //could be considered the "master" kernel that controls the logic of when to enqueue tasks
   //First, it checks if a condition is met, then it enqueues kernelB
   if (some condition)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelB(args);});
   }
   else
   {
      //do other things
   }
}
kernel void kernelB(args)
{
   //Do some stuff
   //Only enqueue the next kernel with the first work item. I do this because the things
   //occurring in kernelC rely on the things that kernelB does, so it must take place after kernelB is completed,
   //hence, the CLK_ENQUEUE_FLAGS_WAIT_KERNEL
   if (get_global_id(0) == 0)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelC(args);});
   }
}
kernel void kernelC(args)
{
   //Do some stuff. This one in particular is one step in a sorting algorithm
   //This kernel will enqueue kernelD if a condition is met, otherwise it will
   //return to kernelA
   if (get_global_id(0) == 0 && other requirements)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelD(args);});
   }
   else if (get_global_id(0) == 0)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelA(args);});
   }
}
kernel void kernelD(args)
{
   //Do some stuff
   //Finally, if some condition is met, enqueue kernelC again. What this will do is it will
   //bounce back and forth between kernelC and kernelD until the condition is
   //no longer met. If it isn't met, go back to kernelA
   if (some condition)
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(some amount,       256), ^{kernelC(args);});
   }
   else
   {
      enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange_1D(1, 1),       ^{kernelA(args);});
   }
}

So that is the general flow of the program, and it works perfectly and does exactly as I intended it to do, in the exact order I intended it to do it in, except for one issue. In certain cases when the workload is very high, a random one of the enqueue_kernel()s will fail to enqueue and halt the program. This happens because the device queue is full, and it cannot fit another task into it. But I cannot for the life of me figure out why this is, even after extensive research.

I thought that once a task in the queue (a kernel for instance) is finished, it would free up that spot in the queue. I am only enqueuing a single kernel at a time, so my queue should really only reach a max of like 1 or 2 tasks at a time. But this program will literally fill up the entire 262,144 byte size of the device command queue, and stop functioning. Is there some way to clear the queue or something?

I would greatly appreciate some potential insight as to why this is happening if anyone has any ideas. I am sort of stuck and cannot continue until I get past this issue.

Thank you in advance!

(BTW I am running on a Radeon RX 590 card, and am using the AMD APP SDK 3.0 to use with OpenCL 2.0)

hardcoregames_ · ‎06-06-2020

you are the first here to use CL 2.0, while most still use 1.2 to be more widely compatible

queue managent is hard, you have to be sure it has been handled and put a timeout on it so that it is serviced or abandoned

PC Drivers & Software

OpenCL 2.0 Device command queue keeps filling up and halting execution