3 Replies Latest reply on Jan 24, 2011 10:58 AM by LeeHowes

    Native functions and kernels sharing buffers in OpenCL


      I read ATI Stream Computing Guide, and googled a bit, but I couldn't find answers to the following questions that were bothering me:

      1. I have ATI Mobility Radeon 5650 and I tried to use native_divide function with integer data as arguments, instead of regular /. There is no error reported, but I get wrong results. Why is this?

      2. In one of my kernels I have an argument step that always has a value that is a power of two. Replacing i%step with  i & (step-1) should improve performance. But after replacement, I again get wrong results.

      3. I was wondering if two kernels that run one after another could share a buffer. I was implementing an out-of-place algorithm and the idea was to have the output of one kernel become the input for the second one, without having to transfer data back to host and then again from host to device. Does anyone know if this is possible?

      Thank you in advance for your time and answers!


        • Native functions and kernels sharing buffers in OpenCL

          3. yes it is possible. just enqueue two kernels with same buffer. you must of course ensure synchronization if you use same mem object between different queues or in queue which is out of order.

            • Native functions and kernels sharing buffers in OpenCL


              Yes nou is correct. The global buffers obey a memory persistent model so all the memory used by a kernel remains as such till the program finally terminates. You can use any number of kernels in queue with each using the output of previous kenel as its input.

              Regarding the other problems i would realy appreciate if you can send some test case so we can reproduce the issue at our end. We can try to fix them in the next SDK release.


                • Native functions and kernels sharing buffers in OpenCL

                  1) The native functions take floats as arguments. I presume it's casting to a float implicitly in your code, so ensure that you're not losing information in the cast. Also, how do you define "wrong"? native_divide is certainly aiming for a lower precision result than the standard divide. Is it actually wrong, or merely less precise than you would like?

                  2) What type is i? What does the ISA code look like that is generated from your change?

                  As just suggested, a test case would help people give you a clearer answer. As you can see from my questions above your query is not specific enough to give a clear answer to.