Although I'm working on a nVidia GPU I guess my problem is vendor independent, so I wanted to ask if anyone has an idea.
Here is a description of my problem:
Recently, I started a project in which I'm planning to do an audio loop an the GPU to filter audio samples from my soundcard on the GPU.
My first step was to see how to write the samples to the device memory and write them back to my soundcard. I did this, for comparison on the host memory too (store the samples in a buffer and write them back).
In OpenCL (for GPU processing) I'm using "clEnqueueWriteBuffer" to write the data to the device memory and "clEnqueueReadBuffer" to read it from there.
When I just write and read the soundcard samples I've got the problem, that every 300 - 500 loops the writing and reading-command takes 10 times more then in all other loops, so I got some scratch errors every few seconds in my output from the soundcard. I printed out the execution time of the loop, and in these loops its bigger then the given amount of time for one loop (256 samples / 44100 Hz = 5.3 ms).
I suggest, that this is a synchronization problem, but I don't have a clue how to solve it. I'm using blocking read/write so I thought it should be synchronized.
Does anybody have an idea why I got these execution time differences every 300 - 500 loops?