This content has been marked as final. Show 1 reply
There are a few reasons for what you are seeing.
1) The currently released implementation of the samples provides mainly a kernel performance but there is the problem that the kernels are not optimal. If you look at double_matmult in the CAL SDK you will see that performance is a lot higher.
2) The reason why one of the cores is at 100% utilization is that you are queueing up multiple iterations of the kernel and while the GPU is running the cpu is waiting in a busy loop for it to finish. A loop similiar to this: while (calCtxIsEventDone(*ctx, event) == CAL_RESULT_PENDING); where it is waiting for the final event to finish.
3) Since you are queue up multiple iterations, and in this case hundreds to thousands, there is a certain number that are run as a batch. While this batch is being run, the GPU cannot be utilized by another process. This means that xorg is effectively locked out, this issue will happen with any long running kernel when programming the GPU. There is no context switching like in the CPU world where one process can be interrupted and another process can run on the device. The exact reason for why it is getting choppy is probably related to a bunch of things being backed up by the kernel hogging the GPU.
As for the last question, it is up to the application programmer to make sure that the kernel in usage does not take down the system. That is one of the downfalls of having lower level access to a device.