cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

roboto
Adept I

Different behaviors when device has reached its maximum global memory limit

Hello,

I have a HD 7970 with 6G and I have an OpenCL program(.exe) that takes up about 3.3G of Global memory.

Two odd behaviors:

First I'll call the OpenCL program I want to start myOpenCLApp.exe

Behavior 1:

I am able to create two myOpenCLApp.exe and successfully get an output by spawning it as a process via CreateProcessA of windows API. This is odd since creating two myOpenCLApp.exe surpasses my global memory limit by ~600Mb. I observe the same behavior with smaller global memory on other GPU devices i.e NVidia. Using my tools, I throttling on GPU activity and the execution time slows down considerably.

I create the process like this:

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

CreateProcessA("myOpenCLApp.exe", NULL, NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

Total memory for both processes: 6.63GB. This is puzzling as I expect the second call to CreateProcessA to fail. However, what seems to happen is that both processes run just fine just really slow.

What is this behavior I'm seeing and where can I find more info on it? I have not seem much material online about this.

Behavior 2:

I used the command line for this.

On the command line I use:

start /b myOpenCLApp.exe

and again

start /b myOpenCLApp.exe

This gives me an error when I try to start the second process CL_MEM_OBJECT_ALLOCATION_FAILURE .

I expect to see this when starting the process the way I did for Behavior 1.

What's going on here?

Behavior 1 has one common parent while Behavior 2 is somehow different?

I also observe the behavior 1's pattern when applying to multiple threads and one common parent.

Please let me know if I missed something obvious

Regards.

0 Likes
8 Replies

Hi Roboto,

    This seems to be related to the way Windows OS handles processes. We would like to reproduce this from our side before making a committed remark.

Thanks,

AMD Support

0 Likes

Please let me know if you need help.

Thanks!

0 Likes

Hi Roboto,

Can you share your code with us?

Thanks,

AMD Support

0 Likes

Please see the expanded code below.

Thanks!

0 Likes
roboto
Adept I

"MyGPUAlgorithm.exe" is very long and also proprietary. I am not allowed to share it. The code below is how I'm using it. It is a GTest I made to test out its stability in a multi-process environment. I summarize the code bellow like this:


1) start a new process called "MyGPUAlgorithm.exe" and give it an argument

2) immediately start a second process same as (1) and also give it an argument

3) expect one of them to fail ; actually the unit test expects both to succeed but that's because they do (what is the behavior?) I think it should fail.


Process 1 and 2 are in their own virtual address with a fresh copy of OpenCL.dll. I expect no relation between the two other than they have a common parent because of the way it is launched with CreateProcessA (it says the GTest is the common parent).


string mypath= ExePath() + "MyGPUAlgorithm.exe"; // this takes about 30 seconds to execute; it searches for platforms and gets a gpu context

    //***command line args***

    string outputFile1= "MyGPUAlgorithm.exe Argument1";

    string outputFile2= "MyGPUAlgorithm.exe Argument2";

    //***end args***

    STARTUPINFOA sinfo;

    PROCESS_INFORMATION pinfo;

    ZeroMemory( &sinfo, sizeof(sinfo) );

    sinfo.cb = sizeof(sinfo);

    ZeroMemory( &pinfo, sizeof(pinfo) );

    bool err= CreateProcessA(mypath.c_str(), const_cast<char*>(outputFile1.c_str()), NULL, NULL,false, 0, NULL, NULL, &sinfo, &pinfo);

    //DWORD lastError= GetLastError();

   

    STARTUPINFOA sinfo2;

    PROCESS_INFORMATION pinfo2;

    ZeroMemory( &sinfo2, sizeof(sinfo2) );

    sinfo2.cb = sizeof(sinfo2);

    ZeroMemory( &pinfo2, sizeof(pinfo2) );

    bool err2= CreateProcessA(mypath.c_str(), const_cast<char*>(outputFile2.c_str()), NULL, NULL,false, 0, NULL, NULL, &sinfo2, &pinfo2);

    //DWORD lastError2= GetLastError();

   

    WaitForSingleObject( pinfo.hProcess, INFINITE );

    WaitForSingleObject( pinfo2.hProcess, INFINITE );

    //time sensitive; so asserts go at end

    ASSERT_TRUE(err);

    ASSERT_TRUE(err2);

0 Likes

When I say "I think it should fail" it is because the total amount of memory taken by "MyGPUAlgorithm.exe" is 3.3 GB and when the second process is started , its total should be around ~6.7. I should get an out of memory error by OpenCL.

Thanks!

0 Likes

Hi,

I tried to look at windows documentation about how GPU device memory is managed. Here is how device memory is managed

During driver initialization, the driver must return the list of segment types that describe how memory resources can be managed by the video memory manager. The driver specifies the number of segment types that it supports and describes each segment type by responding to calls to its DxgkDdiQueryAdapterInfo function. The driver describes each segment using a DXGK_SEGMENTDESCRIPTOR structure. For more information, see Initializing Use of Memory Segments. (Given at Initializing Use of Memory Segments (Windows Drivers)).


Thus it seems that how processes can access GPU memory depends on how device driver handles it. A proper answer could be provided by those having an understanding of how device drivers are written.

0 Likes

Thank you sudarshan,

I'll look more into the links you gave me and into the device drivers.

What troubles me is that if MyGPUAlgorithm.exe is ran as I described (concurrently) it takes about 30 minutes to complete vs a few seconds individually. A solution would be to manage how clients of my system interact with the GPU but I think this would be a daunting task.

0 Likes