I read ATI Stream Computing Guide, and googled a bit, but I couldn't find answers to the following questions that were bothering me:
1. I have ATI Mobility Radeon 5650 and I tried to use native_divide function with integer data as arguments, instead of regular /. There is no error reported, but I get wrong results. Why is this?
2. In one of my kernels I have an argument step that always has a value that is a power of two. Replacing i%step with i & (step-1) should improve performance. But after replacement, I again get wrong results.
3. I was wondering if two kernels that run one after another could share a buffer. I was implementing an out-of-place algorithm and the idea was to have the output of one kernel become the input for the second one, without having to transfer data back to host and then again from host to device. Does anyone know if this is possible?
Thank you in advance for your time and answers!