This seems to be related to the way Windows OS handles processes. We would like to reproduce this from our side before making a committed remark.
Please let me know if you need help.
Can you share your code with us?
Please see the expanded code below.
"MyGPUAlgorithm.exe" is very long and also proprietary. I am not allowed to share it. The code below is how I'm using it. It is a GTest I made to test out its stability in a multi-process environment. I summarize the code bellow like this:
1) start a new process called "MyGPUAlgorithm.exe" and give it an argument
2) immediately start a second process same as (1) and also give it an argument
3) expect one of them to fail ; actually the unit test expects both to succeed but that's because they do (what is the behavior?) I think it should fail.
Process 1 and 2 are in their own virtual address with a fresh copy of OpenCL.dll. I expect no relation between the two other than they have a common parent because of the way it is launched with CreateProcessA (it says the GTest is the common parent).
string mypath= ExePath() + "MyGPUAlgorithm.exe"; // this takes about 30 seconds to execute; it searches for platforms and gets a gpu context
//***command line args***
string outputFile1= "MyGPUAlgorithm.exe Argument1";
string outputFile2= "MyGPUAlgorithm.exe Argument2";
ZeroMemory( &sinfo, sizeof(sinfo) );
sinfo.cb = sizeof(sinfo);
ZeroMemory( &pinfo, sizeof(pinfo) );
bool err= CreateProcessA(mypath.c_str(), const_cast<char*>(outputFile1.c_str()), NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);
//DWORD lastError= GetLastError();
ZeroMemory( &sinfo2, sizeof(sinfo2) );
sinfo2.cb = sizeof(sinfo2);
ZeroMemory( &pinfo2, sizeof(pinfo2) );
bool err2= CreateProcessA(mypath.c_str(), const_cast<char*>(outputFile2.c_str()), NULL, NULL,false, 0, NULL, NULL, &sinfo2, &pinfo2);
//DWORD lastError2= GetLastError();
WaitForSingleObject( pinfo.hProcess, INFINITE );
WaitForSingleObject( pinfo2.hProcess, INFINITE );
//time sensitive; so asserts go at end
When I say "I think it should fail" it is because the total amount of memory taken by "MyGPUAlgorithm.exe" is 3.3 GB and when the second process is started , its total should be around ~6.7. I should get an out of memory error by OpenCL.
I tried to look at windows documentation about how GPU device memory is managed. Here is how device memory is managed
During driver initialization, the driver must return the list of segment types that describe how memory resources can be managed by the video memory manager. The driver specifies the number of segment types that it supports and describes each segment type by responding to calls to its DxgkDdiQueryAdapterInfo function. The driver describes each segment using a DXGK_SEGMENTDESCRIPTOR structure. For more information, see Initializing Use of Memory Segments. (Given at Initializing Use of Memory Segments (Windows Drivers)).
Thus it seems that how processes can access GPU memory depends on how device driver handles it. A proper answer could be provided by those having an understanding of how device drivers are written.
Thank you sudarshan,
I'll look more into the links you gave me and into the device drivers.
What troubles me is that if MyGPUAlgorithm.exe is ran as I described (concurrently) it takes about 30 minutes to complete vs a few seconds individually. A solution would be to manage how clients of my system interact with the GPU but I think this would be a daunting task.