cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Ailer
Journeyman III

Long time running kernel = X freezes

Hi!

I wrote some OpenCL program, whitch do lots of calculations, so kernel run time is big (7 days on CPU). When i try to run it on GPU or APU - computer freezes after 1,5...2 seconds. So, to be sure that freeze occurs because long run time, i wrote this simple kernel:

__kernel void square(__global float* input,__global float* output,const unsigned int count)

{

   int i = get_global_id(0);

  int u;

for (u=0;u<count;u++)

{

  output=output+1;

}

 }          

If i set count=10000, it runs well on GPU and CPU. If i set count=1000000, it runs on CPU, but trying to run on GPU leads to X server completely freeze, and process takes 100% of CPU(fully loads one core). Trying to kill -s 9 it causes no effect - process makrs as <defunct>, and keep loads CPU (memory consumption is 0 after kill attempt). One way to stop it - reboot computer. I try to run this code on another (non-AMD) GPU - it works well.

It is possible to run it on AMD? And is it possible to run it without X(computer have no monitor, it controlled by SSH, so i just don't need X)?

SDK ver: 2.3, ati-drivers ver: 11.6, kernel ver: 2.6.39.

Thanks!



0 Likes
9 Replies
genaganna
Journeyman III

Originally posted by: Ailer Hi!

 

I wrote some OpenCL program, whitch do lots of calculations, so kernel run time is big (7 days on CPU). When i try to run it on GPU or APU - computer freezes after 1,5...2 seconds. So, to be sure that freeze occurs because long run time, i wrote this simple kernel:

 

__kernel void square(__global float* input,__global float* output,const unsigned int count)

 

{

 

   int i = get_global_id(0);

 

  int u;

 

for (u=0;u {

 

  output=output+1;

 

}

 

 }          

 

If i set count=10000, it runs well on GPU and CPU. If i set count=1000000, it runs on CPU, but trying to run on GPU leads to X server completely freeze, and process takes 100% of CPU(fully loads one core). Trying to kill -s 9 it causes no effect - process makrs as , and keep loads CPU (memory consumption is 0 after kill attempt). One way to stop it - reboot computer. I try to run this code on another (non-AMD) GPU - it works well.

 

It is possible to run it on AMD? And is it possible to run it without X(computer have no monitor, it controlled by SSH, so i just don't need X)?

 



SDK ver: 2.3, ati-drivers ver: 11.6, kernel ver: 2.6.39.



X server is required to run OpenCL applications.  You should be able to run such kernel taking lot of time if no monitor is connected to that perticular GPU. This is happening on non-AMD gpu.

If you connect moniter to GPU, this is true for all GPU vendors.

0 Likes

X server is required to run OpenCL

Right now i running my program wintout X on non-AMD video card, it runs about 2 hours, and keep running. X is not started, but kernel runs normally. What i doing wrong?

applications.  You should be able to run such kernel taking lot of time if no monitor is connected to that perticular GPU. This is happening on non-AMD gpu.

 

If you connect moniter to GPU, this is true for all GPU vendors.

 

Ok, what about AMD? Disconnecting monitor can help? My computer is all-in-one, so i can't just disconnect cable... Will try do this on another machine.

0 Likes

AMD driver need Xserver to be loaded and used.

without Xserver you get only CPU device from AMD platform.

and it is normal that screen freez during kernel execution as GPU can run only one program at a time so it cant update a display. but you should operate normaly throuth a SSH even during this long kernel execution.

0 Likes

and it is normal that screen freez during kernel execution as GPU can run only one program at a time so it cant update a display.


Ok, i understand. But - process on host machine cannot be killed, and runs mutch longer than expected. Thank You, i will try to wait for few days, maybe it will finish normally.

0 Likes

also you should try update to latest SDK and driver. and if 10000 iterations take 5 second then 100000 should take only ten times longer about 50 seconds.

0 Likes
Ailer
Journeyman III

 and if 10000 iterations take 5 second then 100000 should take only ten times longer about 50 seconds.

Thanks for answers!

I make some tests, and found next:

200000 iterations take 9 seconds,

300000 - 18 seconds,

900000 - takes 2 minuts and 30 seconds on my 5470,

but 1000000 causes system freeze. I want about 1 hour and 30 mins - no effect, and i do a reboot. I think, something in drivers crash after ~3 minutes - if 800000 iterations take about 2 min, 900000 - 2 min and 30 sec - 1000000 should take something about 3-4 minutes, but X not responding even after > 1 hour (ssh still works).

Ok, i will try to update drivers and SDK... if i can found ebuilds.

0 Likes

This is a random guess, but could system power saving options be causing a driver crash? Maybe display power off?

I know this doesn't answer your question directly, but wouldn't it be good practice to split up the kernel routine into blocks of runtime? I imagine having an unresponsive PC for several hours isn't desirable. Maybe implement a 'for(u=lastendpoint;u<nextendpoiny;u++)' structure with successive kernel calls (<3sec execution time) would be a workaround. It would only add a few milliseconds to your overall execution time, but it would keep your PC responsive, prevent any potential timeouts on other hardware and make debugging much more managable.

Just a thought.

0 Likes

 I know this doesn't answer your question directly, but wouldn't it be good practice to split up the kernel routine into blocks of runtime? I imagine having an unresponsive PC for several hours isn't desirable. Maybe implement a 'for(u=lastendpoint;u Just a thought.

 



You just read my mind! Today i done it, but for now - no effect, still freezing, but now not so fast. Need to test it with less size of each block, i think, this will help.

0 Likes

I have a similar observation - I'm still trying to get more data. But what I know so far:

I have an OpenCL program with quite some CPU interaction (copying memory buffers, and some preparation work on CPU). Running one instance of the program is quite stable, but the GPU is not fully utilized. Starting the program a second time in parallel brings GPU utilization to 99%. However, after a few hours of runtime, one of the instances locks up. This one then uses no more GPU resources, but one CPU core is fully loaded with it. ps -elf reports the process in some futex_ call. Now the strange thing: When killing the process, it immediately turns defunct, but continues to consume one CPU core. The process is immune to kill -9, and even before trying to kill it, gcore or gdb cannot attach to the process. The only way out: reboot.

I have SuSE 11.4, Catalyst 11.8, AMD APP 2.5, Intel Xeon 2x6-way CPU, HD 5770 GPU, typical kernel runtime ~10ms.

I'm trying to get some information what the kernel is doing at that time. Can anyone point me to some information how to get the kernel mode part of a process stack? Or any other hint what that could be and how to avoid it?

0 Likes