I'd settle for a way to do this from IL! Trying to do that now, I've managed to get rid of conditionals and to insert FRACT manually. But I the second MUL is still there and I can't find a way to get rid of it. (And, without that, being able to insert FRACT is useless, because FRACT must go after MUL.)
mul r254.x___, r254.xxxx, l9.xxxx
frc r254.x___, r254.xxxx
sin r254.x___, r254.xxxx
2 w: MUL ____, R0.x, (0x3DFCB924, 0.1234000027f).x
3 z: FRACT ____, PV2.w
4 y: MUL ____, PV3.z, (0x3E22F983, 0.1591549367f).x
5 x: SIN R0.x, PV4.y
y: SIN ____, PV4.y
z: SIN ____, PV4.y
I have asked sbout the sine functions range ambiguity to concerned people.
Thanks for reporting that.
Range ambiguity is the least of my problems! I want to be able to generate an instruction SIN (or COS) without also generating a multiplication by 0.1591549367f, either from CL or from IL. The fundamental problem seems to be that the native instruction SIN in the 6970 instruction set does not agree with the IL instruction SIN or in the IL specification, and the IL compiler is not smart enough. (And it's not even a new problem, because I've checked the Evergreen/5xxx instruction set, and things worked exactly the same way there, too.)
AMD has a significant edge over NVIDIA in performance of its single-precision floating-point sine and cosine. The 6970 should be able to peak at 338G sines/second. NVIDIA's 580 can only do 99G. But this compiler silliness goes some way to reverse the advantage.
And I also want to be able to generate an instruction FRACT from CL. But that is lower priority.
Overall, my sentiments are well expressed by this thread:
Ideally, I want to be able to write "a=b*c; d=native_amd_fract(a); e=native_amd_sin(d); " and expect that my code will be compiled into three native instructions without any silly multiplications, conditionals, or other code that the compiler might consider necessary to inject into my bottleneck.