kernel runs fine even if kernel has more than 8 output streams with some performance overhead.
Brook compiler generates multipass code and runtime handles properly if kernel has more then 8 output streams.
Ex: if kernel has 10 output. than
Compiler generates two kernels. first kerne having 8 outputs and send kernel having 2 outputs but copies the kernel code as it is in both kernels which is a overhead if your kernel doing lot of computation.
Just break your kernel up yourself, you'll probably get better performance that way.