cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

4 instances of same app using brook lead to driver restart

while 1 instance runs OK

AFAIK long kernel could lead to driver restart if it executed more than 2 seconds under Vista.
No one of kernels in my app took so long. Moreover, I increased driver restart limit via registry to 15 seconds.
Though running 4 copies of app lead to driver restart time to time.
Why brook runtime can do correct scheduling to avoid long driver unavailability to OS watchdog timer?
0 Likes
10 Replies
Gipsel
Adept I

Originally posted by: Raistmer AFAIK long kernel could lead to driver restart if it executed more than 2 seconds under Vista. No one of kernels in my app took so long. Moreover, I increased driver restart limit via registry to 15 seconds. Though running 4 copies of app lead to driver restart time to time. Why brook runtime can do correct scheduling to avoid long driver unavailability to OS watchdog timer?


Are all four instances using the same GPU? For such cases I use a mutex per GPU to allow only the start of one simultaneous kernel per GPU (the waiting kernels are served in a round robin fashion then).

0 Likes

Yes, all apps use the same GPU.
Thanks for suggestion.
I thought such serialization is done on driver level... or at least in brook/cal runtime %)
0 Likes

@Gipsel
Do you use named mutex with MW opt GPU app?
Is it possible to use that mutex to serialize GPU access between few BOINC apps?
If yes what its name?
0 Likes

Originally posted by: Raistmer @Gipsel Do you use named mutex with MW opt GPU app? Is it possible to use that mutex to serialize GPU access between few BOINC apps? If yes what its name?


The same mutex names are already used at MW@home as well as Collatz@home. And yes, running Collatz and MW on the same GPU works, although the current Collatz app uses much smaller execution domains than the MW one (and doesn't get the multi GPU stuff right), so the GPU time is not evenly split between those two apps. But that will hopefully change with the next version where I have more influence than on the current one.

But it may become superfluous with the modified client versions of Crunch3r (the modifications are now already in the official development versions), as it starts only a single instance per GPU and tells the app with a command line parameter ("--device #") which GPU to use (I guess it's the same behaviour as with CUDA).

Nevertheless, this is the code fragment which constructs the mutex names (as I mentioned one per GPU exists). The mutex names are "Global\\Milkyway_ATI_GPU_App_Mutex#", with "#" being the device number of the used GPU (the "which_device" variable in the code below).

char mutex_name[64]; strcpy(mutex_name, "Global\\Milkyway_ATI_GPU_App_Mutex"); [..] itoa(which_device, &(mutex_name[strlen(mutex_name)]),10); // construct mutex name for the chosen GPU GPU_mutex = CreateMutex(&GPU_secatt,false,mutex_name); // opens named mutex, open it, if it already exists, but never obtain it directly if (GPU_mutex==NULL) // if it fails { GPU_mutex=OpenMutex(MUTEX_MODIFY_STATE,false,mutex_name); // try again with less rights if (GPU_mutex==NULL) { cerr<<"Couldn't obtain mutex for GPU access!"<<endl<<flush; return(1); } } // kernel calls are enclosed in the following construct WaitForSingleObject(GPU_mutex,INFINITE); // obtain mutex (waiting for the GPU to become available), wait forever, if necessary GPU_time_s = dtime(); [.. kernel calls ..] GPU_time += dtime() - GPU_time_s; ReleaseMutex(GPU_mutex);

0 Likes

Ok, thank you very much for info!
For current state of AP (only FFA ported on GPU) launching single app per GPU will be waste of GPU, so better I will use mutexes
But it would be great if AP would play nicely with other GPU apps
0 Likes

"&GPU_secatt"
Do you use some specific access rights? Just NULL will not go?
0 Likes

Originally posted by: Raistmer "&GPU_secatt" Do you use some specific access rights? Just NULL will not go?


I don't use specific rights in the moment, but I was thinking about it, because the default access rights don't allow another user to access the same mutex. That means one can't test an application standalone when another instance is launched by the BOINC client. But it doesn't matter on a normal system.

GPU_secatt.lpSecurityDescriptor=NULL; GPU_secatt.bInheritHandle=false; GPU_secatt.nLength=sizeof(GPU_secatt);

0 Likes

Thanks again!
Currently testing 3 AP+1 MW running - will see if driver restarts ended.

EDIT: much more stable now, only single driver restart so far
0 Likes

Originally posted by: Raistmer

EDIT: much more stable now, only single driver restart so far



Are you testing with Vista or WinXP? I would really like to know if the stability problems with newer drivers under XP are gone with the SDK1.4 you use (at least I guess you are using 1.4).

0 Likes

Originally posted by: Gipsel

Originally posted by: Raistmer




EDIT: much more stable now, only single driver restart so far





Are you testing with Vista or WinXP? I would really like to know if the stability problems with newer drivers under XP are gone with the SDK1.4 you use (at least I guess you are using 1.4).



Well, I use Vista x86, Catalyst 9.2 (cause newer versions can't support 1D stream size >8192 and I need such sizes) and SDK 1.4 beta (Brook+ only, no hand-made kernels on IL still)
With only MW (default settings) it works very stable, only one system (GUI actually, filesystem was accessible w/o problems remotely) freeze for few months.
Sure when I run simething like AOE III with MW active driver restart guarantied , but if no 3D stuff running all just OK.
It's Q9450-based host.

But when I tried to launch MW on another host, AMD Athlon 64 based, WinXP x86 , I get driver recoveries or system BSoDs always sooner or later. Record is few MW tasks done and reported. Usualy it hangs before completing first task.
I tried: Catalyst 8.12, 9.1, 9.2 - no success....
Also all slowdown options like n1 f100 w2 - sooner or later but system freezes...
0 Likes