If you creat a stream (for input) you have to perform a read to fill it up. The read will copy from main (=host) memory over the PCIe to the graphical memory.
How ever if you creat a scatter/gatther array as in float4 ar and fill it up in main memory. How do you create that array in graphical memory and how do you copy it?
I think that you can just use that array in the kernel call and brook+ will under the hood do the copy.
What happens if you first use a gather/scatter array as output for kernel1 and then use that array as input for kernel2. Is there a copy to main memory performed?
Where/how do you declare that array: I do't need it in main memory.
Any help appreciated.
In the kernel definition, if you specify the number of elements in the gather array, as in float4 ar, then it is a constant buffer, not a gather stream, although it's usage is the same in your kernel: read only.
From your C program, the array is passed as constants, the same way a single (scalar) constant would be passed to the kernel call. Thus you don't need to declare a stream and load the array into it.
Now if you don't specify the number of elements in the kernel definition, as in float4 ar, then it's a gather stream, which needs to be allocated and read in your C program.
For scatter (output) arrays, you cannot specify the number of elements, since obviously it cannot be a constant array. So it's a regular stream which needs to be loaded.
In you example, you use float4 ar in both kernels 1 and 2, and declare a stream in you C program. If that array is used only as passing mechanism between both kernels and you don't need its values in you C program, you don't need to declare a corresponding C array or use the stream read/write.