Otterz

Detecting device resets

Discussion created by Otterz on Feb 23, 2011
Latest reply on Feb 24, 2011 by Otterz

Hi,

Is it possible to detect if the display driver has been reset? The scenario is that I am using windows 7, with TDR enabled (and I do not want to rely on that being changed). In my app, I would like to know if windows reset the device.

What I am observing is that if TDR occurs, the long running batch of kernels gets killed, the device reset, event.wait() then returned CL_SUCCESS for the killed batch of kernels, and my app merrily goes on submitting more NDRange Kernels (with enqueue returning no error), and the subsequent event.waits() all returning no errors.

But when I reach the point that I want call queue.finish()/flush(), it will hang indefinitely.

Looking at the API I am not seeing the correct way to cause the program to abort in the event that a batch of kernels gets killed.

I have tried registering a callback when creating the context, but that callback does not seem to be called (or perhaps I coded it incorrectly)

void CL_CALLBACK contextCallback(const char *errinfo, // Pointer to an error string const void *private_info, // Binary data .. not sure how to use it size_t cb, // amount of above data void *user_data){ // User supplied data??? std::cout << "Context callback called with error message:" << std::endl << errinfo << std::endl; exit(EXIT_FAILURE); } cl::Context context( CL_DEVICE_TYPE_GPU, // Create context for a CPU cprops, NULL, &contextCallback, &err); checkErr(err, "Context::Context()");

Outcomes