Hello!
I am having some serious trouble with a kernel I'm testing. As I'm testing this, I have a small framework to help me check validity and performance. The work dispatch is as follows:
(I dream of a day the syntax highlighting will work with tabs BTW)
void EnqueueTests(cl_command_queue q, bool performanceTests = true) {
std::cout.flush();
std::cout<<"Beginning tests: "<<from<<":"<<func<<std::endl;
std::cout.flush();
asizei workOff[] = { 0, 0, 0 };
asizei testIndex = 0;
{ // do one call outside benchmarking loop so deferred resources get loaded / created.
cl_uint dim = GetWorkDimensionality(0, 0);
std::vector<asizei> workSize(GetGlobalWorkSize(0, 0));
std::vector<asizei> groupSize(GetGroupSize(0, 0));
cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);
if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));
}
clFinish(q);
if(!performanceTests) return;
for(asizei s = 0; s < GetNumLocalSizes(); s++) {
using std::chrono::time_point;
using std::chrono::high_resolution_clock;
time_point<high_resolution_clock> start(high_resolution_clock::now());
for(asizei loop = 0; loop < repetitions; loop++) {
cl_uint dim = GetWorkDimensionality(s, testIndex);
std::vector<asizei> workSize(GetGlobalWorkSize(s, testIndex));
std::vector<asizei> groupSize(GetGroupSize(s, testIndex++));
cl_int err = clEnqueueNDRangeKernel(q, kern, dim, workOff, workSize.data(), groupSize.data(), 0, NULL, NULL);
if(err != CL_SUCCESS) throw std::string("NDRange error: " + std::to_string(err));
}
clFinish(q);
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(high_resolution_clock::now() - start);
auto average = double(elapsed.count()) / repetitions;
std::cout<<" t="<<aulong(average)<<" ms average"<<std::endl; std::cout.flush();
}
}
When I do validity testing I exit on line 17. I have manually checked the results are correct for some cases (I'm still thinking at how to test this sistematically but this is an intermediate result anyway).
The problem is: if I let the loop run and enqueue some tests I end up hanging the driver.
This particular kernel consumes the following arguments:
The driver hangs at line 29. It will never return from clFlush(q). I wonder if this is due to the buffers being heavily I/O, perhaps some kind of race condition or resource conflict? It happens even with repetitions=1.
Hints appreciated.
Solved! Go to Solution.
Hello Dipak, thank you for your reply.
It's good to see AMD is keeping an eye on this board so closely.
I haven't been able to reproduce on Omega. I still get some driver hangs sometimes but I usually find them to be dependent on some other error I did. Fine enough for me!
Has your issue been resolved with latest Omega driver? If not, please provide a reproducible test-case such that I can forward it to concerned team.
Regards,
Hello Dipak, thank you for your reply.
It's good to see AMD is keeping an eye on this board so closely.
I haven't been able to reproduce on Omega. I still get some driver hangs sometimes but I usually find them to be dependent on some other error I did. Fine enough for me!
Thanks for this feedback.