When enqueueing an OpenCL command that waits for a cl::Event in a cl::CommandQueue any subsequent calls of clReleaseCommandLine (as called by ~cl::CommandQueue();) will cause the application to hang forever, even if the retain count of the Command Queue is > 0.
The bug may be caused by clReleaseCommandLine performing an implicit finish() on the Command Queue instead of a flush(), as per the OpenCL standard.
Please see the attached code for an example.
// // main.cpp // amdtest // // Created by Jan-Gerd Tenberge (janten@gmail.com) on 28.11.11. // Copyright (c) 2011 Westfälische Wilhelms-Universität Münster. // All rights reserved. // // This examples shows a possible bug in the AMD APP SDK where the // destructor of a cl::CommandQueue halts for an infinite time if // any OpenCL command waiting for a cl::Event is waiting on the queue. #include <iostream> #include <vector> #include <boost/thread.hpp> #include <boost/bind.hpp> #include <boost/function.hpp> #define __CL_ENABLE_EXCEPTIONS #include <CL/cl.hpp> void setStatusComplete(cl::UserEvent event, cl::Buffer buffer, cl::CommandQueue queue); int main (int argc, const char * argv[]) { boost::thread* threadp = NULL; try { std::vector<cl::Platform> platforms; cl::Platform::get(&platforms); cl_context_properties props[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[0](), 0 }; /* * We had no AMD GPU for testing, it is therefore unknown whether the bug * affects only CPUs or all devices supported by AMD APP's OpenCL implementation. */ cl::Context context(CL_DEVICE_TYPE_CPU, props); std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); cl::CommandQueue queue(context, devices[0]); int i = 10; cl_uint memSize = sizeof(int); cl::Buffer input(context, CL_MEM_READ_WRITE, memSize); std::vector<cl::Event> eventWaitList; cl::UserEvent dataReceipt(queue.getInfo<CL_QUEUE_CONTEXT>()); eventWaitList.push_back(dataReceipt); std::cout << "Waiting for dataReceipt to be of status CL_COMPLETE" << std::endl; // Write Buffer after setStatusComplete has finished in a different thread queue.enqueueWriteBuffer(input, CL_FALSE, 0, sizeof(int), &i, &eventWaitList, NULL); { /* * Since the bug is triggered by clReleaseCommandQueue run * from ~cl::CommandQueue(), this is sufficient to trigger it. */ cl::CommandQueue q2 = queue; /* Launch setStatusComplete in another thread. * * The method shown here is a stub. In real-world usage setStatusComplete * receives data over a network connection and calls event.setStatus(CL_COMPLETE); * as soon as all data has been retrieved and is ready for upload to the device. * * boost::bind() will try to copy the object queue, calling the CommandQueue destructor, * this will cause the application to hang. The actual execution of the thread one line below * will not be reached, causing a deadlock since setStatus(CL_COMPLETE) will never be * called on dataReceipt. * * Expected result: The thread should be started, triggering the status change of dataReceipt * after three second. This should in turn cause the writerBuffer command to be executed. * The expected result can be observed by using the NVIDIA SDK. */ // boost::function0<void> func = boost::bind(&setStatusComplete, dataReceipt, input, queue); // threadp = new boost::thread(func); } /* * Implicit call of ~cl::CommandQueue(); triggers infinite wait here * if cl::CommandQueue q2 = queue; is used. * Possible cause: clReleaseCommandQueue performs implicit finish() * instead of flush(). */ } catch (cl::Error& err) { std::cout << "Error " << err.err() << " " << err.what() << std::endl; } if (threadp) { threadp->join(); } delete threadp; return 0; } void setStatusComplete(cl::UserEvent event, cl::Buffer buffer, cl::CommandQueue queue) { std::cout << "Setting dataReceipt to CL_COMPLETE in 3 seconds" << std::endl; sleep(3); // This should trigger the execution of the WriteBuffer command // enqueued in the main method. event.setStatus(CL_COMPLETE); }
I confirm this behavior on AMD platform only. Looks like each clReleaseCommandQueue causes clFinish which causes undefined behavior when being called from cl_event's callback.
I was not able to reproduce it neither on Apple or nVidia platform.
Thank you for the feedback, we are looking into this issue.
This issue has already been addressed internally and a fix will be available in the next releases of the runtime.
We cherish this kind of feedback.