cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

myfunnyusername
Journeyman III

Problems with OpenCL since driver 13.11 beta 9.5

Hi everyone,

our application uses OpenCL computations for quite some time now. For other reasons we have always used NVIDIA GPUs so far, but since their OpenCL performance got worse and worse we are in the middle of switching to AMD. This has worked out quite nicely until we noticed some problems after switching to the 13.12 driver. At first we didn't suspect the driver but changes we made to our code. However, in the meantime we have narrowed it down to the driver: version 13.11 beta 9.4 works fine, version 13.11 beta 9.5, 13.12 and also 14.1 beta 1.6 don't.

Let me try to describe the problem: when starting our application we set up the OpenCL framework (context, device(s), command queue, compiling kernels). This always works just fine. Later we actually start using the GPU via OpenCL, i.e. we write to the GPUs memory, enqueue kernels and read our results. Since 13.11 beta 9.5 this does not always work fine! Sometimes it does, sometimes the blocking read of the GPUs memory simply never finishes, the calling thread on the CPU hangs and we have to kill our application.

However, reading back the results does not seem to be the problem. When monitoring the GPU with GPU-Z we normally see an increase in clock frequency, memory usage and GPU load. If the problem occurs, nothing changes at all. It seems that not a single command we fill the queue with is ever executed. We have no idea what's happening here.

Again: the same application on the same system runs fine with 13.11 beta 9.4. Using a later driver version sometimes leads to the described problem. You can have 10 successfull runs and the next one fails without warning.

Observed on multiple systems with different specs, which all use a Radeon R9 280X and run on Windows 7 x64.

Any help is very much appreciated! Thanks.

0 Likes
4 Replies
pinform
Staff

Thanks for reporting this. Could you provide a minimal C/C++ test case (including the host code) that we can use to reproduce the behavior?

0 Likes

The application is rather bulky, but I will try to extract the relevant parts. Do you know whether some details with respect to OpenCL changed from 13.11 Beta 9.4 to Beta 9.5? I could not find any information on this in the release notes.

0 Likes

Sorry this took so long, but other things got in the way. Here's the good thing though: it is not the driver but a nasty bug on our side which I finally found today.

Background: we focused our analysis on AMD because the bug does not show up on NVIDIA cards and here's why: using clEnqueueReadBuffer or clEnqueueWriteBuffer, you can specify whether the operation should be blocking or non-blocking. We forgot to issue a blocking write at the right point, which lead to the problem described above. The NVIDIA driver doesn't care about this parameter and simply always issues a blocking read/write. While this is not according to the standard and has already annoyed us elsewhere, it did prevent the problem. I would like to know whether this behaviour of the NVIDIA driver is by design ("play it safe") or just a bug. Guess I will ask them.

Thanks and have a nice day!

0 Likes

Glad to note that you found/fixed the issue. Careful with those reads and writes!

0 Likes