Hi everybody, I'm working on a project on the GPU for signal processing. It includes multiplications, additions and FFTs/IFFTs. The data set is quite huge, and the kernels that I've developed are 9 in total, the most complicated are obviously those for FFTs/IFFTs. I've got a problem that I really don't understand. I create the context, the queue, the buffers for all the kernels in the proper way (most of them are device buffers, the first ones of the chain are pinned buffers to read input data from host variables), and then I set the kernel arguments (apart for some of them that I've to set before enqueing the kernel since I use the same kernel twice during processing). Most of the processing proceeds in a good way, but a certain point I obtain uncorrect data (compared to a similar algorithm developed to run on the CPU). This is because one of the pinned buffers in the middle of the chain, just for some values, is read uncorrectly on the GPU (debugging with CodeXL not all the values are the same of those of the host variable associated to the buffer). But, if I read the buffer before being pushed for kernel processing or after kernel output, the values of the input buffer are identical, no problems.
According to you, which could be the source of the problem?
I'm very sorry for giving you a very short description of the problem (and maybe a bit confusing) and unfortunately I can't share pieces of the code for privacy rules.
Regards to everybody