15 Replies Latest reply on Apr 17, 2013 5:53 AM by jpsollie

    OpenCL kernel recycling makes Xorg crash

    jpsollie

      System:

      CPU: turion ultra ZM-86 / 4GB

      GPU: radeon HD 4570 / 512MB

       

      Software:

      Xorg 1.12.4

      Catalyst 13.1 for legacy GPU

      GCC 4.6.3

      linux kernel 3.2.43

       

      I have a very weird problem while performing an 'iteration' over an openCL kernel:

      when the program iterates over the kernel for more than 2 times, it makes the Xorg server crash.

      I can run the program more than 2 times sequentially when it does not perform iteration

      I can run the program in iteration mode when launching it from the console: the system is perfectly stable when Xorg is not running.

       

      some minor remark here is that during kernel execution, Xorg also 'locks up': if you put a clock on the background before launching the program in non-iterative mode, it just keeps the time before launch until it finalizes, then the screen is redrawn.

       

      Is there any way I can instruct the program to be less 'aggressive" with its resources? I already tried a clfinish at the end of each iteration to make sure I did not forgot any read / writes in the command queue, but that didn't help either.

        • Re: OpenCL kernel recycling makes Xorg crash
          himanshu.gautam

          1. How long does your kernel run?

          2  imho, . Seems more like a bug in your code.... Please post your code

          • Re: OpenCL kernel recycling makes Xorg crash
            jpsollie

            even more interesting:

             

            the problem may not be related to Xorg itself:

            the 'iteration' is currently running on my pc at tty0, and if I use Xfce4 instead of kde 4.8, the system is completely operational.

             

            do I have to verify resources that may be requested by an openGL engine?

            • Re: OpenCL kernel recycling makes Xorg crash
              jpsollie

              Is there any way to detect if a device is currently connected to an active screen?  That would be the last resort, except moving away from this aging HD 4570 and optimizing the kernel

              • Re: OpenCL kernel recycling makes Xorg crash
                jpsollie

                I optimized my kernel a bit (replaced % with & where possible, decreased memory usage, etc ...), and gained a 15% performance increase, so that's nice.  but the problem does not seem to solve itself, so I guess I'll just have to find myself a GPU which is more suited for this stuff (the bottleneck is memory bandwidth).  case can be closed

                • Re: OpenCL kernel recycling makes Xorg crash
                  jpsollie

                  I have a question concerning this topic:

                  I have a pc which is capable of running 3 VGA cards, and used this one to experiment with this program.

                  Currently, this pc is populated with nvidia 8800/9500 cards (bought way before the actual phenom X6), and some signs of a solution seem to appear.  Of course, the lack of GPU capabilities (8800 cores were actually never designed to be openCL 1.0 compatible) will not fix the problem.

                  A card replacement may let me continue my research, but I got one question that annoys me a bit:

                  in the openCL benchmarks, AMD cards seem to perform way better with integer operations (the ones I am performing) for the same price than nvidias, so that would be obvious.  However, the workgroup size limit is only 256 compared to 1024 with nvidia cards.  Is this a driver software limit which AMD might change in one of its next releases? cause it would be useful if I could put a few more items in the same workgroup

                    • Re: OpenCL kernel recycling makes Xorg crash
                      himanshu.gautam

                      jpsollie wrote:

                       

                      However, the workgroup size limit is only 256 compared to 1024 with nvidia cards.  Is this a driver software limit which AMD might change in one of its next releases? cause it would be useful if I could put a few more items in the same workgroup

                      I do not think, that limit will change any time soon. And I do not see a big reason, to support 1024 work-items in a compute unit anyhow. The intent of GPU computing is to use the available GPU resources to their maximum. You can always breakdown your work within 256 threads, as compared to 1024 threads, by assigning 4 times more work to each thread.

                        • Re: OpenCL kernel recycling makes Xorg crash
                          jpsollie

                          true, but that could imply I got to enqueue a few extra kernel executions.  the process is mainly coordinated by get_local_id(i) and get_group_id(i) and a wg size of 1024 would let me use a 2nd workgroup dimension from time to time (32*32). Isn't this an overhead worth thinking about (I absolutely have no clue, so if it's a stupid question, just tell)