cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lrdxgm
Journeyman III

Bug? CL_MEM_COPY_HOST_PTR is not thread safe

Hello,

I have encountered something what I believe is a bug in AMD's OpenCL implementation of clCreateBuffer using CL_MEM_COPY_HOST_PTR flag.

Here is a minimal code that makes the AMD OpenCL runtime crash:

#include <CL/cl.hpp>

#include <ppl.h>

cl::CommandQueue queue = cl::CommandQueue::getDefault();

void run() {

    cl_int value = 0;

    cl::Buffer buff(CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value);

    queue.enqueueReadBuffer(buff, CL_TRUE, 0, sizeof(cl_int), &value);

};

int main() {

    for(int i = 0; i < 10000; ++i) {

        concurrency::parallel_invoke(&run, &run, &run, &run, &run, &run, &run, &run, &run, &run);

        queue.finish();

    }

}

Compile and run, the program crashes with various errors like access violation or trying to double-free memory.

Instead of PPL, one can use TBB's parallel_invoke with the same effect.

I have a Radeon R9 290 + Opteron 6234 machine with Windows 8.1 64 bit and Microsoft Visual Studio 2013 Update 3 (same with Update 4 what's just been released; but I think the compiler version is irrelevant). I currently use the 14.11.1 beta driver, but have tried a range of different driver including (if I remember correctly) 14.4 WHQL, 14.9 WHQL, 14.9.1 beta, 14.9.2 beta.

Removing the CL_MEM_COPY_HOST_PTR flag (and using enqueueFillBuffer or enqueueWriteBuffer to initialize) makes this problem go away. Changing the main loop to call the 'run' function sequentially also makes crashes disappear.

I have tested the same code with Intel's runtime using a Core i7-3667U laptop, where the code executes properly using the HD 4000.

Workaround is actually easy, use clEnqueueFill/WriteBuffer instead, but the error is insanely hard to detect: In original code base where I encountered this problem there are no compiler warnings (with /W3) and no AMD, Intel or Microsoft static or dynamic analysis tool found any warning, and the crash happens in a random position inside amdocl.dll, and because the lack of a PDB file, even the stack trace is wrong/useless.

I would like to ask to:

a, Fix the crash

b, Provide a warning by CodeXL's analyzer/debugger (or any relevant tool) for using CL_MEM_COPY_HOST_PTR if it is not considered best practice.

0 Likes
6 Replies
dipak
Big Boss

My apologies for this delayed reply.

When I ran the above code, I observed the same behaviour as you mentioned. However, and interestingly, after replacing some of the OpenCL C++ wrapper API to C API (as shown below), the code worked fine.

#include <CL/cl.hpp>

#include <ppl.h>

cl::CommandQueue queue = cl::CommandQueue::getDefault();

cl::Context context = cl::Context::getDefault();

void run() {

  cl_int value = 0;

  //cl::Buffer buff(CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value);

  cl_mem buff = clCreateBuffer(context(), CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value, NULL);

  //queue.enqueueReadBuffer(buff, CL_TRUE, 0, sizeof(cl_int), &value);

  clEnqueueReadBuffer(queue(), buff, CL_TRUE, 0, sizeof(cl_int), &value, 0, NULL, NULL);

};

// main is same as yours

As I suspect that the issue may be related to implementation of C++ wrapper API, not the actual OpenCL C API.Can you please check and let me know your findings?

Regards,

0 Likes

cl::Buffer() call which doesn't have context specified call internally this method Context context = Context::getDefault(err); I had several crash on nVidia platform when I specified the buffer without context. I think this part of the C++ binding where you can create OpenCL objects without context can lead to very sloppy code.

0 Likes

You missed a quite important step: freeing the cl::Buffer:

cl::CommandQueue queue = cl::CommandQueue::getDefault();

cl::Context context = cl::Context::getDefault();

void run() {

  cl_int value = 0;

  cl_mem buff = clCreateBuffer(context(), CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value, NULL);

  clEnqueueReadBuffer(queue(), buff, CL_TRUE, 0, sizeof(cl_int), &value, 0, NULL, NULL);

  clReleaseMemObject(buff);

};

Same crash - the buffer should not be 'used' at the time of release, because the read is blocking.

Nou: I agree, I normally use the 'normal' versions of the C++ bindings, this is just a minimal example to reproduce the crash.

0 Likes

Yes, its crashing after adding the clReleaseMemObject(buff). However, I don't see any reason for this segfault.

Another point is, it is working fine for following case:

cl_mem buff = clCreateBuffer(context(), CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value, NULL);

Maybe an expert from runtime team would be able to explain this behaviour. I'll forward it to concerned team and get back to you as soon as get any reply.

Regards,

0 Likes

Update:

I've filed an internal bug report against this issue. If get any update, I'll let you know.

Regards.

0 Likes

Update:

The issue has been resolved in the latest catalyst build (internal). That version is not yet publicly available. Hope you'll get it with next catalyst release. Thanks for your patience.

Regards,

0 Likes