Native sine/cosine

Discussion created by eugenek on Nov 30, 2011
Latest reply on Nov 30, 2011 by eugenek

I want to squeeze some performance out of code that does many sines/cosines.

Consider attached code.

The compiler from APP SDK 2.5 generates the following ISA code for 6970, with and without -cl-fast-relaxed-math:

      2  z: MUL_e       R0.z,  R0.x,  (0x3DFCB924, 0.1234000027f).x     
      3  y: MUL         R0.y,  PV2.z,  (0x3E22F983, 0.1591549367f).x     
      4  x: SETGT       ____,  |PV3.y|,  (0x42480000, 50.0f).x     
         w: FRACT       ____,  PV3.y     
      5  z: CNDE_INT    ____,  PV4.x,  R0.y,  PV4.w     
      6  y: CNDE        ____,  R0.z,  R0.z,  PV5.z     
      7  x: SIN         R0.x,  PV6.y     
         y: SIN         ____,  PV6.y     
         z: SIN         ____,  PV6.y   

Evidently, native SIN instruction on the 6970 really calculates sin(2*pi*x) and its argument must lie within the range of -50 to 50. The instruction set architecture document confirms this but says that the valid range is -256 to 256. No matter.

Here's what's going on here:

Line 2, I multiply the argument b 0.1234

Line 3, compiler divides it by 2 pi

Lines 4-6, compiler truncates the argument into valid range

Line 7, we  finally get to compile the sine.

What I want is to write code that will generate just three instructions, "MUL, "FRACT" and "SIN". Constants could be divided by 2 pi at compile time (in fact, I'm surprised that the compiler does not merge lines 2 and 3), and I'd always execute FRACT because... well, it's always executed as is, so what's the point having a conditional on top of that?

Unfortunately, there's no native_sinpi or native_fract in the OpenCL standard. modf(float) and fract(float) generate bulky code.

Is there any way to do this without rewriting everything in ISA?

__kernel void test(__global float* p) { const float a = 0.1234f; p[0] = native_sin(a*p[0]); }