6 Replies Latest reply on May 21, 2015 4:44 AM by dipak

    Bug? CL_MEM_COPY_HOST_PTR is not thread safe

    lrdxgm

      Hello,

       

      I have encountered something what I believe is a bug in AMD's OpenCL implementation of clCreateBuffer using CL_MEM_COPY_HOST_PTR flag.

      Here is a minimal code that makes the AMD OpenCL runtime crash:

       

      #include <CL/cl.hpp>

      #include <ppl.h>

       

      cl::CommandQueue queue = cl::CommandQueue::getDefault();

       

      void run() {

          cl_int value = 0;

          cl::Buffer buff(CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value);

          queue.enqueueReadBuffer(buff, CL_TRUE, 0, sizeof(cl_int), &value);

      };

       

      int main() {

          for(int i = 0; i < 10000; ++i) {

              concurrency::parallel_invoke(&run, &run, &run, &run, &run, &run, &run, &run, &run, &run);

              queue.finish();

          }

      }

       

      Compile and run, the program crashes with various errors like access violation or trying to double-free memory.

      Instead of PPL, one can use TBB's parallel_invoke with the same effect.

       

      I have a Radeon R9 290 + Opteron 6234 machine with Windows 8.1 64 bit and Microsoft Visual Studio 2013 Update 3 (same with Update 4 what's just been released; but I think the compiler version is irrelevant). I currently use the 14.11.1 beta driver, but have tried a range of different driver including (if I remember correctly) 14.4 WHQL, 14.9 WHQL, 14.9.1 beta, 14.9.2 beta.

       

      Removing the CL_MEM_COPY_HOST_PTR flag (and using enqueueFillBuffer or enqueueWriteBuffer to initialize) makes this problem go away. Changing the main loop to call the 'run' function sequentially also makes crashes disappear.

      I have tested the same code with Intel's runtime using a Core i7-3667U laptop, where the code executes properly using the HD 4000.

       

      Workaround is actually easy, use clEnqueueFill/WriteBuffer instead, but the error is insanely hard to detect: In original code base where I encountered this problem there are no compiler warnings (with /W3) and no AMD, Intel or Microsoft static or dynamic analysis tool found any warning, and the crash happens in a random position inside amdocl.dll, and because the lack of a PDB file, even the stack trace is wrong/useless.

       

      I would like to ask to:

      a, Fix the crash

      b, Provide a warning by CodeXL's analyzer/debugger (or any relevant tool) for using CL_MEM_COPY_HOST_PTR if it is not considered best practice.

        • Re: Bug? CL_MEM_COPY_HOST_PTR is not thread safe
          dipak

          My apologies for this delayed reply.

          When I ran the above code, I observed the same behaviour as you mentioned. However, and interestingly, after replacing some of the OpenCL C++ wrapper API to C API (as shown below), the code worked fine.

           

          #include <CL/cl.hpp>

          #include <ppl.h>

           

          cl::CommandQueue queue = cl::CommandQueue::getDefault();

          cl::Context context = cl::Context::getDefault();

           

          void run() {

            cl_int value = 0;

            //cl::Buffer buff(CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value);

            cl_mem buff = clCreateBuffer(context(), CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value, NULL);

           

            //queue.enqueueReadBuffer(buff, CL_TRUE, 0, sizeof(cl_int), &value);

            clEnqueueReadBuffer(queue(), buff, CL_TRUE, 0, sizeof(cl_int), &value, 0, NULL, NULL);

          };

           

          // main is same as yours

           

          As I suspect that the issue may be related to implementation of C++ wrapper API, not the actual OpenCL C API.Can you please check and let me know your findings?

           

          Regards,

            • Re: Bug? CL_MEM_COPY_HOST_PTR is not thread safe
              nou

              cl::Buffer() call which doesn't have context specified call internally this method Context context = Context::getDefault(err); I had several crash on nVidia platform when I specified the buffer without context. I think this part of the C++ binding where you can create OpenCL objects without context can lead to very sloppy code.

              • Re: Bug? CL_MEM_COPY_HOST_PTR is not thread safe
                lrdxgm

                You missed a quite important step: freeing the cl::Buffer:

                 

                cl::CommandQueue queue = cl::CommandQueue::getDefault();

                cl::Context context = cl::Context::getDefault();

                 

                void run() {

                  cl_int value = 0;

                  cl_mem buff = clCreateBuffer(context(), CL_MEM_COPY_HOST_PTR, sizeof(cl_int), &value, NULL);

                  clEnqueueReadBuffer(queue(), buff, CL_TRUE, 0, sizeof(cl_int), &value, 0, NULL, NULL);

                  clReleaseMemObject(buff);

                };

                 

                Same crash - the buffer should not be 'used' at the time of release, because the read is blocking.

                 

                Nou: I agree, I normally use the 'normal' versions of the C++ bindings, this is just a minimal example to reproduce the crash.