clReleaseCommandQueue hang in Windows driver (no events)

Discussion created by skuto on Sep 18, 2017
Latest reply on Oct 13, 2017 by skuto

Some of my users are seeing hangs in the AMD OpenCL drivers, for example driver version 17.7.2 with an AMD RX 480.


Platform version: OpenCL 2.0 AMD-APP (2442.8)

Platform profile: FULL_PROFILE

Platform name:    AMD Accelerated Parallel Processing

Platform vendor:  Advanced Micro Devices, Inc.

Device ID:     2

Device name:   Ellesmere

Device type:   GPU

Device vendor: Advanced Micro Devices, Inc.

Device driver: 2442.8

Device speed:  1266 MHz

Device cores:  36 CU

Device score:  1120


The stack on the bottom right hangs indefinitely. I've found several references to this problem for the case where there are events in the queue...but in this case, there are none. My code doesn't use them, and in fact when this hang was triggered there hadn't been any commands or kernels sent to the device, aside from creating OpenCL contexts from multiple threads and freeing them again. This problem happens intermittently. The same code runs correctly with Intel and NVIDIA's OpenCL drivers.


In terms of source and calling sequence, assuming Khronos C++ OpenCL headers, it's roughly doing:


class ThreadData {


    bool m_is_initialized{false};

    cl::CommandQueue m_commandqueue;



thread_local ThreadData opencl_thread_data;


void OpenCL::thread_init() {

    if (!opencl_thread_data.m_is_initialized) {

        opencl_thread_data.m_commandqueue = cl::CommandQueue(cl::Context::getDefault(),


        opencl_thread_data.m_is_initialized = true;


    sleep(rand() % 1000);



int main(void) {

std::thread t1(thread_init());


std::thread t8(thread_init());


for (;;) {

std::thread t9(thread_init());





It then hangs in the t9.join() because the TLS destructor for ThreadData calls the cl::CommandQueue destructor, who calls the clReleaseCommandQueue that never returns.