at first I am sorry of my poor english - I hope you're able to understand
I am working on implement a large integer factoring method.
Now I have a problem with my output data, because I expect factors arround 300 to 400 digits - so I have to return values of the size of 1024 - 2048 bits. In one uint4 I can store 128 bits, so using 8 normal output streams give me only 1024 bits of data - might be not enough.
I'm already using gather for input data so I don't wan't to use scatter for my outputs, because of performance reasons. Is there any other method for returning more than 1024 bits per kernel? Another Problem is the memory use - I don't have to return the results of every kernel becauce most of them don't find a real factor of my number - any ideas how to return only the "interesting" results?
PS: I have to use local array's while calculating. Is their performance similar to gather and scatter or like normal variables?
Thanks for reading