cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

maxdz8
Elite

Troubleshooting a driver hang

Jump to solution

Hello!

I am having some serious trouble with a kernel I'm testing. As I'm testing this, I have a small framework to help me check validity and performance. The work dispatch is as follows:

(I dream of a day the syntax highlighting will work with tabs BTW)


    void EnqueueTests(cl_command_queue q, bool performanceTests = true) {


        std::cout.flush();


        std::cout<<"Beginning tests: "<<from<<":"<<func<<std::endl;


        std::cout.flush();


        asizei workOff[] = { 0, 0, 0 };


        asizei testIndex = 0;


        { // do one call outside benchmarking loop so deferred resources get loaded / created.


            cl_uint dim = GetWorkDimensionality(0, 0);


            std::vector<asizei> workSize(GetGlobalWorkSize(0, 0));


            std::vector<asizei> groupSize(GetGroupSize(0, 0));


            cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);


            if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));


        }




        clFinish(q);


        if(!performanceTests) return;


        for(asizei s = 0; s < GetNumLocalSizes(); s++) {


            using std::chrono::time_point;


            using std::chrono::high_resolution_clock;


            time_point<high_resolution_clock> start(high_resolution_clock::now());


            for(asizei loop = 0; loop < repetitions; loop++) {


                cl_uint dim = GetWorkDimensionality(s, testIndex);


                std::vector<asizei> workSize(GetGlobalWorkSize(s, testIndex));


                std::vector<asizei> groupSize(GetGroupSize(s, testIndex++));


                cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);


                if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));


            }


            clFinish(q);


            auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(high_resolution_clock::now() - start);


            auto average = double(elapsed.count()) / repetitions;


            std::cout<<"  t="<<aulong(average)<<" ms average"<<std::endl; std::cout.flush();


        }


    }


When I do validity testing I exit on line 17. I have manually checked the results are correct for some cases (I'm still thinking at how to test this sistematically but this is an intermediate result anyway).

The problem is: if I let the loop run and enqueue some tests I end up hanging the driver.

This particular kernel consumes the following arguments:

  1. global uint *header: basically shared across all WIs (should really be constant but it isn't as this apparently causes my queues to fail with OUT_OF_RESOURCES);
  2. global uchar *output: each WI writes 256 bytes sequentially. The writes are strided for ease of testing ATM;
  3. const uint outerIterations
  4. global uchar *buff_a, global uchar *buff_b: temporary scratchpads. They are both initialized at start based on the values from header. The latter is also mangled pretty heavily in an inner loop.

The driver hangs at line 29. It will never return from clFlush(q). I wonder if this is due to the buffers being heavily I/O, perhaps some kind of race condition or resource conflict? It happens even with repetitions=1.

Hints appreciated.


0 Likes
1 Solution

Accepted Solutions
maxdz8
Elite

Re: Troubleshooting a driver hang

Jump to solution

Hello Dipak, thank you for your reply.

It's good to see AMD is keeping an eye on this board so closely.

I haven't been able to reproduce on Omega. I still get some driver hangs sometimes but I usually find them to be dependent on some other error I did. Fine enough for me!

View solution in original post

0 Likes
3 Replies
dipak
Staff
Staff

Re: Troubleshooting a driver hang

Jump to solution

Has your issue been resolved with latest Omega driver? If not, please provide a reproducible test-case such that I can forward it to concerned team.

Regards,

maxdz8
Elite

Re: Troubleshooting a driver hang

Jump to solution

Hello Dipak, thank you for your reply.

It's good to see AMD is keeping an eye on this board so closely.

I haven't been able to reproduce on Omega. I still get some driver hangs sometimes but I usually find them to be dependent on some other error I did. Fine enough for me!

0 Likes
dipak
Staff
Staff

Re: Troubleshooting a driver hang

Jump to solution

Thanks for this feedback.

0 Likes