8 Replies Latest reply on Feb 5, 2011 9:28 PM by laobrasuca

    Aborting long-time kernel, how?

    bubu

      Imagine I execute a very time-consuming kernel using clEnqueueRangeND().

      How can I stop it if the user presses a button?

       

      I'm currently trying to abort it as:

       

      cl_event l_evt;

      clEnqueueRangeND( .... &l_evt );

       

      void OnCancelButtonClick()

      {

          if ( l_evt!=NULL )

          {

             clReleaseEvent ( l_evt );

             clFinish();

          }

      }

       

      the question is... will that clReleaseEvent(l_evt)+clFinish() abort the kernel execution in a resonable time really?

      ps: Before you suggest it... i CANNOT make the kernel simpler+use multiple kernel calls. Assume a BIG time-consuming kernel, pls. That's the whole point of the thread.

      Thanks.

        • Aborting long-time kernel, how?
          MicahVillmow
          Only the OS has the ability to interrupt the GPU, as the GPU is not a pre-emptible device a user app cannot interrupt an execution. The OS has the ability to do this by reseting the device.
            • Aborting long-time kernel, how?
              bubu

               

              Originally posted by: MicahVillmow Only the OS has the ability to interrupt the GPU, as the GPU is not a pre-emptible device a user app cannot interrupt an execution. The OS has the ability to do this by reseting the device.


              Won't be possible to add some kind of GPU task manager ( like the Window's one ) in your Catalyst's CCC then? That would be fantastic to kill unresponsive GPGPU programs or to change priorities.

              And, for this specific case... what would happen if I release the event but the kernel has not finished? A crash? Just by curiosity.

              And a thing that procupies me... What if I disable manually the wathdog via registry's TdrLevel? A hacker could use his abilities to completely hang your computer with an infinite loop inside the kernel...

               

                • Aborting long-time kernel, how?
                  laobrasuca

                  this reminds me this point: what about getting rid of system freezing when application crashes on GPU? Man, system reboot is a paint! Sometimes the system manages to restart the driver, saving me from rebooting, but lot's of time, it doesn't. However, when using the CPU as device, the system never freezes (application crashes, but there's no freeze whatsoever). Maybe Catalyst could auto restart when things go wrong?

              • Aborting long-time kernel, how?
                MicahVillmow
                laobrasuca,
                Most likely your whole system didn't actually hang, what happens is you are hanging the GPU and it no longer is responding to the reset command. Because your GPU runs the GUI, your system GUI hangs and it seems like your system is hung. You should still be able to SSH into your machine at this point.

                bubu,
                Your kernel would still finish and then the event would get released.

                The GPU is not a CPU, so you cannot treat it like one. There is no pre-emption, interruption or graceful error recovery on a lot of bad programs. If you infinite loop on the GPU, you most likely have to reboot your system.
                  • Aborting long-time kernel, how?
                    laobrasuca

                    micah, you're certainly right about not freezing the system itself, although you're quite tied up if you can't restart the GUI, driver or whatever.

                    I can understand that GPU does not support features like pre-emption, interruption or graceful error recovery, but with the advent of GPGPU the paradigms change. When you give the possibility to program the GPU, you've got to give the "crtl+c" possibility too when things go wrong. Unless it's technically impossible or it kills the GPU performance somehow (?), it would be great if it was supported, just like printf (cl_amd_printf) or advanced debugging features (in the 2.4 SDK maybe ).

                  • Aborting long-time kernel, how?
                    MicahVillmow
                    laobrasuca,
                    It is a hardware issue, nothing in software will fix it.