Might this be a scenario that branching would help?
You can call kernels from kernels. There's info on it in the SDK docs. There's penalties for this, as from what I understand each time a branch is encountered, it stops all other threads and works to finish the branch... I think that's what they meant anyhow... Thus branches cause a sort of exponential increase in calculations.
I think not so, or I doesnt understand correctly whar you mean.
I have to use shortarrays because of local streams are not supported. =(
Otherwhise i have to unroll the loop which ends in tricky, bad readable & long code. I doesnt want to go this way, so I am searching for an alternative.....
Local array support is currently not in brook+. In order to use local arrays you need to access AMD HLSL or CAL IL.