In the kernel definition, if you specify the number of elements in the gather array, as in float4 ar[1024], then it is a constant buffer, not a gather stream, although it's usage is the same in your kernel: read only.
From your C program, the array is passed as constants, the same way a single (scalar) constant would be passed to the kernel call. Thus you don't need to declare a stream and load the array into it.
Now if you don't specify the number of elements in the kernel definition, as in float4 ar[], then it's a gather stream, which needs to be allocated and read in your C program.
For scatter (output) arrays, you cannot specify the number of elements, since obviously it cannot be a constant array. So it's a regular stream which needs to be loaded.
In you example, you use float4 ar[] in both kernels 1 and 2, and declare a stream in you C program. If that array is used only as passing mechanism between both kernels and you don't need its values in you C program, you don't need to declare a corresponding C array or use the stream read/write.