What do kernel calls and kernel declarations look like for streams of arrays?

For instance, if I have the parameter A<500>[1000] how do I pass it to a kernel function and how does it look in the kernel declaration?

I would like to do streams of arrays since I want to be able to access each element of a stream with the kernel. Is it possible (or will it be with v1.0) to do a sum (or like operation, say a simple sort) of the 1000 elements for each stream? So the streams run in parallel and each array element is summed sequentially for each stream element (which will be done in parallel).

If someone could point me to an example or some documentation that would be great. I really hope that the documentation for the official v1.0 of Brook+ is MUCH more detailed. The CAL documentation is good but the Brook+ documentation is severely lacking.