5 Replies Latest reply on Jun 7, 2009 12:47 PM by Raistmer

    What more effective: stream domain or kernel domain?

    Raistmer
      (and other part of stream processing questions)

      I need to process only part of data array in kernel call.
      It seems there are two possibilities to accomplish that.
      1) by creating substream from big stream by using Stream::Domain() call
      2) by using original big stream in kernel call but with setting domain of execution for kernel via domainOffset() domainSize() calls to kernel interface.
      Is it true that both methods can be used for this purpose (processing only part of big data array per kernel call) and what method will be faster if yes?
       P.S. What a mail of webmaster of these boards? - this emotion parsing thing can make people crazy >-:|
       

      ADDON (some refinement):
      I need to change only size of processed arrays. Processing always starts from zero offset so always only region in beginning of array computed. Only size of region will differ.
        • What more effective: stream domain or kernel domain?
          gaurav.garg

          Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.

            • What more effective: stream domain or kernel domain?
              Raistmer
              Thank you!
              So will go with domain of execution thing.
                • What more effective: stream domain or kernel domain?
                  Raistmer
                  Another close related question.

                  After running kernel on part of stream by using domain of execution I need to transfer data back to host memory.
                  Both host and GPU memory contain long arrays but only part of this array should be updated. Is it possible to transfer only first N elements of stream to host mem array w/o additional memory copies inside GPU ?
                  (If I understood right using stream domain for this aim will incur to additional memory copies inside GPU)
                • What more effective: stream domain or kernel domain?
                  wgbljl

                  Domain of execution must be used in scatter stream, which is un-cached memor access. It seems that the performance is rather poor. So I'm uncertain which manner is faster.

                   

                  Originally posted by: gaurav.garg Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.

                    • What more effective: stream domain or kernel domain?
                      Raistmer
                      Originally posted by: wgbljl

                      Domain of execution must be used in scatter stream, which is un-cached memor access. It seems that the performance is rather poor. So I'm uncertain which manner is faster.




                      Originally posted by: gaurav.garg Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.








                      Actually I use it with <> stream (non-scater) and it produces correct results.

                      Current question is it possible to use reduce kernel with different input stream and reduction variable types ?

                      I need to report somehow back to CPU what bins have power bigger than threshold w/o copying whole array back to host mem....