cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

theo_borm
Journeyman III

choppyness when running double_precision_optimized_matmult_d demo

symptom of blocking io in Xorg or fglrx kernel module?

I Tried three things:

./double_precision_optimized_matmult_d -q -t -x 512 -i 10000
Time: 62.741000s
Gflops: 39.84635

./double_precision_optimized_matmult_d -q -t -x 1024 -i 1000
Time: 49.943000s
Gflops: 40.04565

./double_precision_optimized_matmult_d -q -t -x 2048 -i 100
Time: 44.449000s
Gflops: 35.99631

All reasonable results, though a tad short of what I imagine should be possible with the hardware.

First is not a problem, second results in the system freezing for ~1 second every ~7 seconds (choppy), third results in the system freezing for ~10 seconds every ~11 second (unusable)

looking at processor usage I see that 1 (of 4) cores gets ~ 100% utilization by the matmult_d program (reasonable because it's not a multithreaded app), and that (especially noticeable in the second case) the freezing glitches coincide with a drop in utilization of the cpu running matmult_d, and a jump (to 100%) of the utilization of the cpu running Xorg. Also I see that often Xorg and matmult_d "swap cores".

This pattern suggests to me that matmult_d is doing something that causes it to block waiting for Xorg, and then Xorg does something that is (seemingly) not interruptible. My first hunches are either something bus/DMA related or something kernel space related (perhaps spending too long in the fglrx kernel module with interrupts turned off?).

Would this be a correct assesment of the situation, and does this mean that it is up to the application programmer alone to maintain responsiveness of the system? If this is the case I am a bit woried about the potential consequences for system stability of using the proprietary fglrx driver. (Of course it could be Xorg's problem, but I'm not used to this kind of behaviour - but then again, I'm no Xguru).

Regards,

Theo

0 Likes
1 Reply

Theo,
There are a few reasons for what you are seeing.
1) The currently released implementation of the samples provides mainly a kernel performance but there is the problem that the kernels are not optimal. If you look at double_matmult in the CAL SDK you will see that performance is a lot higher.
2) The reason why one of the cores is at 100% utilization is that you are queueing up multiple iterations of the kernel and while the GPU is running the cpu is waiting in a busy loop for it to finish. A loop similiar to this: while (calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING); where it is waiting for the final event to finish.
3) Since you are queue up multiple iterations, and in this case hundreds to thousands, there is a certain number that are run as a batch. While this batch is being run, the GPU cannot be utilized by another process. This means that xorg is effectively locked out, this issue will happen with any long running kernel when programming the GPU. There is no context switching like in the CPU world where one process can be interrupted and another process can run on the device. The exact reason for why it is getting choppy is probably related to a bunch of things being backed up by the kernel hogging the GPU.

As for the last question, it is up to the application programmer to make sure that the kernel in usage does not take down the system. That is one of the downfalls of having lower level access to a device.
0 Likes