Wow..what did I really do when I switched to native_sin and native_cos?

Stream KernelAnalyzer says that my kernel went from 158 million kernels per second to 222 million kernels per second when I switched one cos and one sine to native_cos and native_sin

The weird part is that the basic structure of the code is this:

1. setup, including a call to cos(float4)

2. loop doing 256 iterations

3. teardown, including a call to sin(float4)

All I did was change the two trig functions.  Why does the SKA tool think I've invented a new sliced bread?  The wall clock certainly doesn't agree with that.

