1 Reply Latest reply on Jul 25, 2008 8:53 PM by MicahVillmow

    choppyness when running double_precision_optimized_matmult_d demo

      symptom of blocking io in Xorg or fglrx kernel module?

      I Tried three things:

      ./double_precision_optimized_matmult_d -q -t -x 512 -i 10000
      Time: 62.741000s
      Gflops: 39.84635

      ./double_precision_optimized_matmult_d -q -t -x 1024 -i 1000
      Time: 49.943000s
      Gflops: 40.04565

      ./double_precision_optimized_matmult_d -q -t -x 2048 -i 100
      Time: 44.449000s
      Gflops: 35.99631

      All reasonable results, though a tad short of what I imagine should be possible with the hardware.

      First is not a problem, second results in the system freezing for ~1 second every ~7 seconds (choppy), third results in the system freezing for ~10 seconds every ~11 second (unusable)

      looking at processor usage I see that 1 (of 4) cores gets ~ 100% utilization by the matmult_d program (reasonable because it's not a multithreaded app), and that (especially noticeable in the second case) the freezing glitches coincide with a drop in utilization of the cpu running matmult_d, and a jump (to 100%) of the utilization of the cpu running Xorg. Also I see that often Xorg and matmult_d "swap cores".

      This pattern suggests to me that matmult_d is doing something that causes it to block waiting for Xorg, and then Xorg does something that is (seemingly) not interruptible. My first hunches are either something bus/DMA related or something kernel space related (perhaps spending too long in the fglrx kernel module with interrupts turned off?).

      Would this be a correct assesment of the situation, and does this mean that it is up to the application programmer alone to maintain responsiveness of the system? If this is the case I am a bit woried about the potential consequences for system stability of using the proprietary fglrx driver. (Of course it could be Xorg's problem, but I'm not used to this kind of behaviour - but then again, I'm no Xguru).



        • choppyness when running double_precision_optimized_matmult_d demo
          There are a few reasons for what you are seeing.
          1) The currently released implementation of the samples provides mainly a kernel performance but there is the problem that the kernels are not optimal. If you look at double_matmult in the CAL SDK you will see that performance is a lot higher.
          2) The reason why one of the cores is at 100% utilization is that you are queueing up multiple iterations of the kernel and while the GPU is running the cpu is waiting in a busy loop for it to finish. A loop similiar to this: while (calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING); where it is waiting for the final event to finish.
          3) Since you are queue up multiple iterations, and in this case hundreds to thousands, there is a certain number that are run as a batch. While this batch is being run, the GPU cannot be utilized by another process. This means that xorg is effectively locked out, this issue will happen with any long running kernel when programming the GPU. There is no context switching like in the CPU world where one process can be interrupted and another process can run on the device. The exact reason for why it is getting choppy is probably related to a bunch of things being backed up by the kernel hogging the GPU.

          As for the last question, it is up to the application programmer to make sure that the kernel in usage does not take down the system. That is one of the downfalls of having lower level access to a device.