Is there any efficient implementation of stream copy, i.e.
kernel void streamCopy(float4 a<>,out float4 b<>
{
b=a;
}