Having read through the documentation and had a play with Stream I've got a couple of questions that aren't really covered in the documentation as far as I can see. My application is an evolutionary simulation (GA) so although the questions are general they've got that context.
1) To what extent is Stream intended to be used for process heavy code rather than pure calculations?For example, my GA has some code that involves processes that branch, but that could be run against each individual in the population in parallel. Is the branching likely to wipe out the advantages of parallel processing leaving me better off doing that off the GPU and only accelerating the calculation-heavy functions?
2) Each individual in my population is composed of a set of values whose fitness can be calculated independently. The number of values per individual is low (approx 300) but there are a large number of individuals (thousands) Is it likely to be faster to:
a) send a 3d stream of all values for all individuals to the GPU at once
b) loop over the individuals and send a 2d stream for each individually
If the answer is 'it depends', what does it depend on and how might I estimate the best performing approach in advance?
Thanks in advance for any replies.