cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

skuto
Journeyman III

clReleaseCommandQueue hang in Windows driver (no events)

Some of my users are seeing hangs in the AMD OpenCL drivers, for example driver version 17.7.2 with an AMD RX 480.

Platform version: OpenCL 2.0 AMD-APP (2442.8)

Platform profile: FULL_PROFILE

Platform name:    AMD Accelerated Parallel Processing

Platform vendor:  Advanced Micro Devices, Inc.

Device ID:     2

Device name:   Ellesmere

Device type:   GPU

Device vendor: Advanced Micro Devices, Inc.

Device driver: 2442.8

Device speed:  1266 MHz

Device cores:  36 CU

Device score:  1120

https://sjeng.org/ftp/work/amd_stack.png

The stack on the bottom right hangs indefinitely. I've found several references to this problem for the case where there are events in the queue...but in this case, there are none. My code doesn't use them, and in fact when this hang was triggered there hadn't been any commands or kernels sent to the device, aside from creating OpenCL contexts from multiple threads and freeing them again. This problem happens intermittently. The same code runs correctly with Intel and NVIDIA's OpenCL drivers.

In terms of source and calling sequence, assuming Khronos C++ OpenCL headers, it's roughly doing:

class ThreadData {

private:

    bool m_is_initialized{false};

    cl::CommandQueue m_commandqueue;

};

thread_local ThreadData opencl_thread_data;

void OpenCL::thread_init() {

    if (!opencl_thread_data.m_is_initialized) {

        opencl_thread_data.m_commandqueue = cl::CommandQueue(cl::Context::getDefault(),

                                                             cl::Device::getDefault());

        opencl_thread_data.m_is_initialized = true;

    }

    sleep(rand() % 1000);

}

int main(void) {

std::thread t1(thread_init());

...

std::thread t8(thread_init());

for (;;) {

std::thread t9(thread_init());

t9.join();

}

}

It then hangs in the t9.join() because the TLS destructor for ThreadData calls the cl::CommandQueue destructor, who calls the clReleaseCommandQueue that never returns.

0 Likes
6 Replies
skuto
Journeyman III

In addition to this problem, the above code skeleton can also cause issues in the drivers for Kaveri/Spectre and (reportedly) Vega, but (reportedly) not Polaris. The cl::CommandQueue allocation in thread_init will block in the t9 thread_init, meanwhile *another* thread in the amdocl.dll driver will crash attempting a Low-Fragmentation Heap Free call.

I'm able to work around the latter problem by using a thread-pool for all OpenCL work, so the driver never gets to see an clReleaseCommandQueue call until program exit, but generally speaking command queue creation and deletion seems to have several threading related bugs in the current Windows drivers.

0 Likes

Thanks for reporting it. Please share a reproducible test-case so that concerned team could run and validate the problem.

Regards,

0 Likes

A testcase can be constructed by simply adding #include <CL/cl2.hpp> (OpenCL C++ headers) and #include <thread> to the code given - but I'm not 100% sure it's enough for a reliable repro.

Alternatively, you can compare

https://www.sjeng.org/dl/setupLeela0110b5.exe

https://www.sjeng.org/dl/setupLeela0110b6.exe

The first one will crash quickly on AMD Spectre/Kaveri systems (and possibly others - I can't verify on hardware I don't have). The difference is that the second one will never let clReleaseCommandQueue be called during program execution, thus avoiding the apparent threading bug in the driver.

Note that the original report includes the stacktrace through the AMD driver, together with the exact version. So it's should be possible for you to see the offending calling sequence. Unfortunately the AMD symbol servers don't provide those to the public.

0 Likes

Thanks for providing the repro. We'll check and get back to you shortly.

Regards,

0 Likes

Seems that both the links points to same file. Please check.

Also, I was under the impression that the issue could be reproducible by running those .exe files.  Whereas the executables install a game and I'm not sure how to reproduce the error using it.

A testcase can be constructed by simply adding #include <CL/cl2.hpp> (OpenCL C++ headers) and #include <thread> to the code given

In that case, it would be helpful if you could modify the code snippet accordingly and share a complete test-case or executable that produce the hang.

Regards,

0 Likes

Not sure what happened to the links. The forum is displaying the links correctly, but linking to the wrong one! b5 should crash, b6 should work. You should be able to copypaste them or just change the 5 to a 6. In any case for reproducing you only need b5 which is the one linked.

Sorry, forgot to include instructions: run the OpenCL version of the game, File->New Game, Board Size 19x19->OK and click randomly on the board for a bit. You'll get a hang instantly in b5.

I'll see if I can make a standalone repro case using just the code given, need to get to the Kaveri system.

0 Likes