cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

___
Adept I

Opencl queue on device strange behaviour

OS version: Windows 10 x64 2004.

Driver version: Crimson 20.9.2

GPU: Vega 64

Opencl spec says that in order to create device queue such queue parameters should be set:

CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE

And about CL_QUEUE_ON_DEVICE_DEFAULT :

The application must create the default device queue if any kernels containing calls to get_default_queue are
enqueued. There can only be one default device queue for each device within a context.
clCreateCommandQueueWithProperties with CL_QUEUE_PROPERTIES set to CL_QUEUE_ON_DEVICE |
CL_QUEUE_ON_DEVICE_DEFAULT will return the default device queue that has already been created and
increment its retain count by 1.

vectorAdd kernel clEnqueueNDRangeKernel call fails with CL_OUT_OF_RESOURCES status code if I use  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE without
CL_QUEUE_ON_DEVICE_DEFAULT. And fine if i include it into queue properties. But i don't call
get_default_queue in my kernel code, so it should be fine. But it is not.

In practice clinfo says returns this:

Max on device queues:                          1

So maybe having queues other than default doesn't make sense right now. But I want to know if current behavior is wrong or not.

I'll attach test.cl file and cpp file which can be used to reproduce this behavior. Change value of

setDefaultDeviceQueue variable from false to true and vice versa in order to see behavior difference with and without CL_QUEUE_ON_DEVICE_DEFAULT.

opencl file code:

__attribute__((reqd_work_group_size(64, 1, 1)))
__kernel void vectorAddChild(__global int *A, __global int *B, __global int *C)
{
  int gid = get_global_id(0);
  C[gid] = A[gid] + B[gid];
}

__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void vectorAdd(__global int *A, __global int *B, __global int *C, queue_t q)
{
  clk_event_t finishKernelEvent, enqueueMarkerEvent;

  ndrange_t ndrange = ndrange_1D(64, 64);

  void (^vectorAddChildBLK)(void) = ^{vectorAddChild(A, B, C);};

  enqueue_kernel(
    q,
    CLK_ENQUEUE_FLAGS_NO_WAIT,
    ndrange,
    0,
    NULL,
    &finishKernelEvent,
    vectorAddChildBLK
  );
 
  enqueue_marker(q, 1, &finishKernelEvent, &enqueueMarkerEvent);
 
  release_event(finishKernelEvent);
  release_event(enqueueMarkerEvent);
}

1 Solution
german
Staff

Currently AMD OCL runtime can't detect if a kernel really uses get_default_queue() or not and will setup the default queue always. Thus the app must create it.
Unfortunately OCL conformance tests use CL_QUEUE_ON_DEVICE_DEFAULT always for the first queue and will exercise the path without CL_QUEUE_ON_DEVICE_DEFAULT only if OCL device reports >1 device queues. Internal tests also specified CL_QUEUE_ON_DEVICE_DEFAULT during the queue creation.

View solution in original post

3 Replies
dipak
Big Boss

Thank you for reporting this and providing the reproducible test-case.

I was able to reproduce it. Let me check with the OpenCL team. I'll get back to you soon.

Thanks.

0 Likes
german
Staff

Currently AMD OCL runtime can't detect if a kernel really uses get_default_queue() or not and will setup the default queue always. Thus the app must create it.
Unfortunately OCL conformance tests use CL_QUEUE_ON_DEVICE_DEFAULT always for the first queue and will exercise the path without CL_QUEUE_ON_DEVICE_DEFAULT only if OCL device reports >1 device queues. Internal tests also specified CL_QUEUE_ON_DEVICE_DEFAULT during the queue creation.

Thank you for a detailed explanation.

0 Likes