3 Replies Latest reply on Mar 29, 2011 8:41 PM by mejlango

    CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST

    mejlango

       

      Hi,

       

       I would like to ask what is the cause of this error. The documentation says:

      if the read and write operations are blocking and the execution status of any of the events in event_wait_list is a negative integer value. 

       ,but it didn't help me to find out why my program ends with this error.

      Everything worked with no error, until I passed bigger buffer to kernel. After that, the result of the kernel seems to be correct, but clWaitForEvents(...) ends with -14 error code. Also the warning with message "Display driver AMD stopped responding and has successfully recovered."


      Global memory size: 268435456
      Constant buffer size: 65536
      Max number of constant args: 8
      Local memory type: Global
      Local memory size: 16384
      Kernel Preferred work group size multiple: 32

       

      thanks for help



      cl_mem m_matrix = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, 5971968, matrix, &errCode);//5,9MB cl_mem m_cnt = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int), cnt, &errCode); clSetKernelArg(clManager->GetKernels()[0], 0, sizeof(m_matrix), (void*)&m_matrix); clSetKernelArg(clManager->GetKernels()[0], 1, sizeof(int), (void*)&arg1); clSetKernelArg(clManager->GetKernels()[0], 2, sizeof(int), (void*)&arg2); clSetKernelArg(clManager->GetKernels()[0], 3, sizeof(int), (void*)&arg3); clSetKernelArg(clManager->GetKernels()[0], 4, sizeof(m_cnt), (void*)&m_cnt); cl_event event1 = NULL; cl_event event2 = NULL; cl_uint NDRange = 1; size_t* globalThreads = new size_t[NDRange]; globalThreads[0] = (((2*_nvars-1)/64)+1)*64; //ceil size_t* localThreads = new size_t[NDRange]; localThreads[0] = 64; cl_int status = clEnqueueNDRangeKernel(commandQueue, kernel, NDRange, NULL, globalThreads, localThreads, 0, NULL, &event1); //status = 0 status = clWaitForEvents(1, &event1); //status = 0 status = clEnqueueReadBuffer(commandQueue, m_matrix, CL_TRUE, 0, 5971968, matrix, 0, NULL, &event2); //status = 0 status = clWaitForEvents(1, &event2); //status = -14 if(status != CL_SUCCESS) { return -1; }

        • CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
          himanshu.gautam

          "Display driver AMD stopped responding and has successfully recovered"

          This warning is caused when the kernel time exceeds the watchdog timer limit as set by windows. The windows restarts the GPU as it needs it to refresh the screen.

          You can get details about how to tackle this in AMD APP SDK Documentation.

          • CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
            mejlango

             

            thanks for your reply,

             

            I tried to set registry using documentation, which says:

            Under Windows Vista, to prevent long programs from causing a dialog to be displayed indicating that the display driver has stopped responding, disable the Vista Timeout Detection and Recovery (TDR) feature, which is trying to detect hangs in graphics hardware. To do this, use regedit.exe to create the following REG_DWORD entry in the registry, and set its value to 0: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel
            This avoids the constant polling by the driver and the kernel to prevent long work units from monopolizing the device. (To restore default functionality, set the TdrLevel to 3.) Note that Microsoft strongly discourages disabling this feature, and only recommends doing so for debugging purposes. Do so at your own risk. 

            more info on web:  http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx

            but it didn't solve the problem.

            I think, that when the GPU restarts after the long  time (exceeding default TdrDelay 2 seconds), the clEnqueueReadBuffer fails, because of:

            The operating system resets the appropriate state of the graphics stack. The Video Memory Manager component of the graphics stack purges all allocations from video memory.  (see web page) - The memory is not available.

            I tried to set the registry keys using these sources on different GPUs, but with no success.  The message "Display driver AMD stopped responding and has successfully recovered" still appears.

            I used:

            TdrLevel = 0

            TdrDelay = 32 (DEC)

            TdrDdiDelay = 32 (DEC)

             

            ATI HD 4500

            Name: ATI RV710
            Driver version: CAL 1.4.900
            Version: OpenCL 1.0 ATI-Stream-v2.3 (451)

            Windows 7 professional

            thanks for any help

             

             

              • CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
                mejlango

                Hi,

                I solved this problem using registry key TdrDebugMode = 1 using the website from previous post which tells gpu to ignore any timeout. It is only temporary solution, but it works for me.

                (I tried it on 2 different GPU, but it worked only on first GPU. Maybe I did something wrong on the second one).

                Maybe it helps somebody with similar problem.