cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

What more effective: stream domain or kernel domain?

(and other part of stream processing questions)

I need to process only part of data array in kernel call.
It seems there are two possibilities to accomplish that.
1) by creating substream from big stream by using Stream::Domain() call
2) by using original big stream in kernel call but with setting domain of execution for kernel via domainOffset() domainSize() calls to kernel interface.
Is it true that both methods can be used for this purpose (processing only part of big data array per kernel call) and what method will be faster if yes?
 P.S. What a mail of webmaster of these boards? - this emotion parsing thing can make people crazy >-:|
 

ADDON (some refinement):
I need to change only size of processed arrays. Processing always starts from zero offset so always only region in beginning of array computed. Only size of region will differ.
0 Likes
5 Replies
gaurav_garg
Adept I

Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.

0 Likes

Thank you!
So will go with domain of execution thing.
0 Likes

Another close related question.

After running kernel on part of stream by using domain of execution I need to transfer data back to host memory.
Both host and GPU memory contain long arrays but only part of this array should be updated. Is it possible to transfer only first N elements of stream to host mem array w/o additional memory copies inside GPU ?
(If I understood right using stream domain for this aim will incur to additional memory copies inside GPU)
0 Likes

Domain of execution must be used in scatter stream, which is un-cached memor access. It seems that the performance is rather poor. So I'm uncertain which manner is faster.

Originally posted by: gaurav.garg Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.

0 Likes

Originally posted by: wgbljl

Domain of execution must be used in scatter stream, which is un-cached memor access. It seems that the performance is rather poor. So I'm uncertain which manner is faster.




Originally posted by: gaurav.garg Domain of execution should be faster. Domain is implemented by copying data between original stram and new domain stream, hence using domain would behave like a multi-pass algorithm. It is recommended to avoid domain, instead use domain of execution.








Actually I use it with <> stream (non-scater) and it produces correct results.

Current question is it possible to use reduce kernel with different input stream and reduction variable types ?

I need to report somehow back to CPU what bins have power bigger than threshold w/o copying whole array back to host mem....
0 Likes