Although I'm working on a nVidia GPU I guess my problem is vendor independent, so I wanted to ask if anyone has an idea.
Here is a description of my problem:
Recently, I started a project in which I'm planning to do an audio loop an the GPU to filter audio samples from my soundcard on the GPU.
My first step was to see how to write the samples to the device memory and write them back to my soundcard. I did this, for comparison on the host memory too (store the samples in a buffer and write them back).
In OpenCL (for GPU processing) I'm using "clEnqueueWriteBuffer" to write the data to the device memory and "clEnqueueReadBuffer" to read it from there.
When I just write and read the soundcard samples I've got the problem, that every 300 - 500 loops the writing and reading-command takes 10 times more then in all other loops, so I got some scratch errors every few seconds in my output from the soundcard. I printed out the execution time of the loop, and in these loops its bigger then the given amount of time for one loop (256 samples / 44100 Hz = 5.3 ms).
I suggest, that this is a synchronization problem, but I don't have a clue how to solve it. I'm using blocking read/write so I thought it should be synchronized.
Does anybody have an idea why I got these execution time differences every 300 - 500 loops?
Are you using Vista or Windows 7? Both those operating systems give priority to the Aero UI when scheduling GPU tasks.
Otherwise just try fiddling with the thread priorities on the CPU.
Those are my best guesses anyway. Also please note that an AMD GPU will likely behave completely differently in this respect to an nVidia one (although no guarantees that it will behave better).