3 Replies Latest reply on Jan 7, 2015 4:43 AM by dipak

    Troubleshooting a driver hang

    maxdz8

      Hello!

      I am having some serious trouble with a kernel I'm testing. As I'm testing this, I have a small framework to help me check validity and performance. The work dispatch is as follows:

      (I dream of a day the syntax highlighting will work with tabs BTW)

          void EnqueueTests(cl_command_queue q, bool performanceTests = true) {
              std::cout.flush();
              std::cout<<"Beginning tests: "<<from<<":"<<func<<std::endl;
              std::cout.flush();
              asizei workOff[] = { 0, 0, 0 };
              asizei testIndex = 0;
              { // do one call outside benchmarking loop so deferred resources get loaded / created.
                  cl_uint dim = GetWorkDimensionality(0, 0);
                  std::vector<asizei> workSize(GetGlobalWorkSize(0, 0));
                  std::vector<asizei> groupSize(GetGroupSize(0, 0));
                  cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);
                  if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));
              }
      
      
              clFinish(q);
              if(!performanceTests) return;
              for(asizei s = 0; s < GetNumLocalSizes(); s++) {
                  using std::chrono::time_point;
                  using std::chrono::high_resolution_clock;
                  time_point<high_resolution_clock> start(high_resolution_clock::now());
                  for(asizei loop = 0; loop < repetitions; loop++) {
                      cl_uint dim = GetWorkDimensionality(s, testIndex);
                      std::vector<asizei> workSize(GetGlobalWorkSize(s, testIndex));
                      std::vector<asizei> groupSize(GetGroupSize(s, testIndex++));
                      cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);
                      if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));
                  }
                  clFinish(q);
                  auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(high_resolution_clock::now() - start);
                  auto average = double(elapsed.count()) / repetitions;
                  std::cout<<"  t="<<aulong(average)<<" ms average"<<std::endl; std::cout.flush();
              }
          }
      

      When I do validity testing I exit on line 17. I have manually checked the results are correct for some cases (I'm still thinking at how to test this sistematically but this is an intermediate result anyway).

       

      The problem is: if I let the loop run and enqueue some tests I end up hanging the driver.

      This particular kernel consumes the following arguments:

      1. global uint *header: basically shared across all WIs (should really be constant but it isn't as this apparently causes my queues to fail with OUT_OF_RESOURCES);
      2. global uchar *output: each WI writes 256 bytes sequentially. The writes are strided for ease of testing ATM;
      3. const uint outerIterations
      4. global uchar *buff_a, global uchar *buff_b: temporary scratchpads. They are both initialized at start based on the values from header. The latter is also mangled pretty heavily in an inner loop.

      The driver hangs at line 29. It will never return from clFlush(q). I wonder if this is due to the buffers being heavily I/O, perhaps some kind of race condition or resource conflict? It happens even with repetitions=1.

       

      Hints appreciated.