cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jpsollie
Adept II

OpenCL kernel recycling makes Xorg crash

System:

CPU: turion ultra ZM-86 / 4GB

GPU: radeon HD 4570 / 512MB

Software:

Xorg 1.12.4

Catalyst 13.1 for legacy GPU

GCC 4.6.3

linux kernel 3.2.43

I have a very weird problem while performing an 'iteration' over an openCL kernel:

when the program iterates over the kernel for more than 2 times, it makes the Xorg server crash.

I can run the program more than 2 times sequentially when it does not perform iteration

I can run the program in iteration mode when launching it from the console: the system is perfectly stable when Xorg is not running.

some minor remark here is that during kernel execution, Xorg also 'locks up': if you put a clock on the background before launching the program in non-iterative mode, it just keeps the time before launch until it finalizes, then the screen is redrawn.

Is there any way I can instruct the program to be less 'aggressive" with its resources? I already tried a clfinish at the end of each iteration to make sure I did not forgot any read / writes in the command queue, but that didn't help either.

0 Likes
15 Replies
himanshu_gautam
Grandmaster

1. How long does your kernel run?

2  imho, . Seems more like a bug in your code.... Please post your code

0 Likes

himanshu.gautam wrote:

1. How long does your kernel run?

2  imho, . Seems more like a bug in your code.... Please post your code

1. about a minute

2. I really hoped I wouldn't have to do so :s the code is still in 'under investigation' stage and as such not documented at all

anyway, the C source file is up here: http://83.101.10.32:/debugCL.c

the 'iteration' is in the function 'continue_work', line 473

the device setup is in the function 'progress_task_proposal_advertisement', line 556

0 Likes

If you hog the GPU for 1 minute -- it can potentially kickstart some watchdog timer which can attempt a forceful recovery.

I know this used to the case with Windows long time back... Not too sure what happens in Linux..

But it is just natural to expect a 1-minute long kernel on a diplay card to invite some watchdogs/monitors.

could certainly be some kind of problem ... i'll verify it

*edit* does not seem to be the problem, the drivers here have no watchdog engine:

http://devgurus.amd.com/thread/158796

note that by "crash", als also do not mean that the screen blanks, just any I/O becomes impossible, and there is no screen redrawn anymore, as described in the program thread.  however, between 2 kernel executions, it still should.

0 Likes
jpsollie
Adept II

even more interesting:

the problem may not be related to Xorg itself:

the 'iteration' is currently running on my pc at tty0, and if I use Xfce4 instead of kde 4.8, the system is completely operational.

do I have to verify resources that may be requested by an openGL engine?

0 Likes

OpenCL on GPU is tied to X - currently on Linux.

AMD is working to de-couple the two so that you can run your programs even without the X-server running.

Hogging the GPU for 1 minute will certainly invite trouble.

I will ask around to see what can be done...

but your best bet would be to break down your kernel into multiple kernel launches each probably running for a second or two.

0 Likes

then I believe I found an unexpected feature (or a very serious program malfunction):

The program runs without X running.

Even if X has not been loaded (so at startup, before I'm loading X) the program runs

the only dependency is that the ati kernel module must be loaded - which the kernel loads by itself.

BTW: I don't know if I should change the thread subject, as running the program inside a minimalistic environment (no opengl, directfb, ...) works fine - no screen freezes at all! so the problem is not with X - it might be Qt or another rendering engine which messes up the situation

0 Likes

🙂

Well, mmm.... I am not sure what is going on....

Are you telling that without X-server beng loaded at startup, you are able to run OpenCL?

0 Likes

You can currently run OpenCL app without Xserver if it runs as root. Normal user see only CPU device. fglrx module must be loaded which can be checked with lspci -v

Nou 's right ... the code without X running (as a normal user) indeed only ran on the CPU. As root, (yes, I somehow have the intuition to always log in as root on a tty terminal), the calculations were offloaded to the GPU.  So what happened: the program just selected another device (CPU) to do the job ... as root, it indeed stays on the GPU

0 Likes
jpsollie
Adept II

Is there any way to detect if a device is currently connected to an active screen?  That would be the last resort, except moving away from this aging HD 4570 and optimizing the kernel

0 Likes
jpsollie
Adept II

I optimized my kernel a bit (replaced % with & where possible, decreased memory usage, etc ...), and gained a 15% performance increase, so that's nice.  but the problem does not seem to solve itself, so I guess I'll just have to find myself a GPU which is more suited for this stuff (the bottleneck is memory bandwidth).  case can be closed

0 Likes
jpsollie
Adept II

I have a question concerning this topic:

I have a pc which is capable of running 3 VGA cards, and used this one to experiment with this program.

Currently, this pc is populated with nvidia 8800/9500 cards (bought way before the actual phenom X6), and some signs of a solution seem to appear.  Of course, the lack of GPU capabilities (8800 cores were actually never designed to be openCL 1.0 compatible) will not fix the problem.

A card replacement may let me continue my research, but I got one question that annoys me a bit:

in the openCL benchmarks, AMD cards seem to perform way better with integer operations (the ones I am performing) for the same price than nvidias, so that would be obvious.  However, the workgroup size limit is only 256 compared to 1024 with nvidia cards.  Is this a driver software limit which AMD might change in one of its next releases? cause it would be useful if I could put a few more items in the same workgroup

0 Likes

jpsollie wrote:

However, the workgroup size limit is only 256 compared to 1024 with nvidia cards.  Is this a driver software limit which AMD might change in one of its next releases? cause it would be useful if I could put a few more items in the same workgroup

I do not think, that limit will change any time soon. And I do not see a big reason, to support 1024 work-items in a compute unit anyhow. The intent of GPU computing is to use the available GPU resources to their maximum. You can always breakdown your work within 256 threads, as compared to 1024 threads, by assigning 4 times more work to each thread.

0 Likes

true, but that could imply I got to enqueue a few extra kernel executions.  the process is mainly coordinated by get_local_id(i) and get_group_id(i) and a wg size of 1024 would let me use a 2nd workgroup dimension from time to time (32*32). Isn't this an overhead worth thinking about (I absolutely have no clue, so if it's a stupid question, just tell)

0 Likes