19 Replies Latest reply on Mar 21, 2011 10:52 AM by himanshu.gautam

    Wow..what did I really do when I switched to native_sin and native_cos?

    kbrafford

      Stream KernelAnalyzer says that my kernel went from 158 million kernels per second to 222 million kernels per second when I switched one cos and one sine to native_cos and native_sin

      The weird part is that the basic structure of the code is this:

      1. setup, including a call to cos(float4)

      2. loop doing 256 iterations

      3. teardown, including a call to sin(float4)

      All I did was change the two trig functions.  Why does the SKA tool think I've invented a new sliced bread?  The wall clock certainly doesn't agree with that.

      --Keith Brafford

        • Wow..what did I really do when I switched to native_sin and native_cos?
          MicahVillmow
          sin/cos are required to fulfill the LLVM requirements for OpenCL floating point accuracy. Native_* versions have no such restraints and compile down to a single hardware instruction. The trig functions are not single instruction functions in most cases.
          • Wow..what did I really do when I switched to native_sin and native_cos?
            MicahVillmow
            If your kernel is not ALU bound, it won't speed up by decreasing the amount of ALU to compute the results.
            • Wow..what did I really do when I switched to native_sin and native_cos?
              eugenek

              native_cos with float argument is 1 hardware instruction. cos with float argument is around 200 instructions if the argument is less than 3140000, and closer to 300 instructions otherwise.

               

               

              Would I be better off calculating the cos and sin values on the CPU and passing them into the kernels?


              Not really. The CPU takes an awful lot of time to compute sin/cos as well. The GPU can manage about 4 billion cosines per second. The CPU can do 200 million cosines per second.

                • Wow..what did I really do when I switched to native_sin and native_cos?
                  Jawed

                  Stand back in awe of the attached code, which results in 142 ALU instructions, 10 fetches, 3 writes, 62 GPRs and 2 scratch registers on HD5870 

                  kernel void test(global float *A, global float *B) { int pos = get_global_id(0); B[pos] = powr(A[pos], A[pos + 1]); }

                    • Wow..what did I really do when I switched to native_sin and native_cos?
                      himanshu.gautam

                      Jawed,

                      Where did you got those numbers? I tried your code in SKA and get 178 Instruction clauses and it uses 11 GPRs. With native version it is 11 clauses and 3 GPRs. The number of reads are 2 and write is 1 in both cases as expected.

                        • Wow..what did I really do when I switched to native_sin and native_cos?
                          Jawed

                          SKA 1.7 with SDK 2.3 installed.

                          I suspect you have a different SDK installed.

                          If I use native_powr() then I get 10 ALUs and 3 GPRs (I suspect you meant 10 not 11).

                          Why isn't native_pow() defined?

                          • Wow..what did I really do when I switched to native_sin and native_cos?
                            eugenek

                             

                            Originally posted by: himanshu.gautam Jawed,

                             

                            Where did you got those numbers? I tried your code in SKA and get 178 Instruction clauses and it uses 11 GPRs. With native version it is 11 clauses and 3 GPRs. The number of reads are 2 and write is 1 in both cases as expected.

                             

                             

                            Maybe you're compiling for a different GPU? I also see 62 GPRs and 10 fetches on a 5870, 63 GPRs and 8 fetches on a 6970.

                             

                            Your CPU may be doing something similar when you ask it to compute a power, it's just not so blatantly obvious. On an Intel Core Duo, the two instructions that do the bulk of the job inside pow() take 165 clock ticks.

                              • Wow..what did I really do when I switched to native_sin and native_cos?
                                himanshu.gautam

                                Can you post the IL\ISA generated without native function.

                                  • Wow..what did I really do when I switched to native_sin and native_cos?
                                    eugenek

                                    here

                                    kernel void test(global float *A, global float *B) { int pos = get_global_id(0); B[pos] = powr(A[pos], A[pos + 1]); } ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(5) KCACHE0(CB0:0-15) 0 t: MULLO_INT ____, R1.x, KC0[1].x 1 y: ADD_INT ____, R0.x, PS0 2 w: ADD_INT ____, PV1.y, KC0[6].x 3 x: LSHL R0.x, PV2.w, (0x00000002, 2.802596929e-45f).x 01 MEM_SCRATCH_WRITE_ACK: VEC_PTR[0].x___, R0, ARRAY_SIZE(1) ELEM_SIZE(3) MARK 02 ALU: ADDR(37) CNT(5) KCACHE1(CB1:0-15) 4 x: MOV R0.x, R0.x 5 z: ADD_INT ____, KC1[0].x, PV4.x 6 y: LSHR R0.y, PV5.z, (0x00000002, 2.802596929e-45f).x w: ADD_INT R0.w, PV5.z, (0x00000004, 5.605193857e-45f).y 03 WAIT_ACK: Outstanding_acks <= 0 04 TEX: ADDR(512) CNT(1) 7 VFETCH R51.x___, R0.y, fc156 MEGA(4) FETCH_TYPE(NO_INDEX_OFFSET) 05 ALU: ADDR(42) CNT(66) 8 x: AND_INT R50.x, (0x7FFFFFFF, 1.#QNANf).x, R51.x z: LSHR R0.z, R0.w, (0x00000002, 2.802596929e-45f).y 9 y: ADD R3.y, -PV8.x, 1.0f z: OR_INT ____, PV8.x, (0x3F800000, 1.0f).x w: LSHR T0.w, PV8.x, (0x00000017, 3.222986468e-44f).y 10 x: MULADD_e ____, PV9.y, (0x3E124925, 0.1428571492f).y, (0x3E2AAAAB, 0.1666666716f).x y: ADD ____, PV9.z, -1.0f z: ADD_INT T0.z, (0xFFFFFF81, -1.#QNANf).z, PV9.w 11 x: CNDE_INT ____, T0.w, PV10.y, R50.x y: LSHR ____, PV10.y, (0x00000017, 3.222986468e-44f).x w: MULADD_e ____, R3.y, PV10.x, (0x3E4CCCCD, 0.200000003f).y 12 x: AND_INT ____, PV11.x, (0x007FFFFF, 1.175494211e-38f).x y: MULADD_e ____, R3.y, PV11.w, (0x3E800000, 0.25f).y z: LSHL ____, PV11.x, 1 w: AND_INT T1.w, PV11.x, (0x007F0000, 1.166310801e-38f).z t: ADD_INT ____, (0xFFFFFF03, -1.#QNANf).w, PV11.y 13 x: OR_INT T0.x, PV12.x, (0x3F000000, 0.5f).x y: AND_INT ____, PV12.z, (0x00010000, 9.183549616e-41f).y z: MULADD_e T0.z, R3.y, PV12.y, (0x3EAAAAAB, 0.3333333433f).z w: CNDE_INT ____, T0.w, PS12, T0.z 14 x: ADD_INT ____, T1.w, PV13.y t: I_TO_F R8.z, PV13.w 15 x: LSHR R3.x, PV14.x, (0x0000000E, 1.961817850e-44f).x w: OR_INT ____, PV14.x, (0x3F000000, 0.5f).y 16 x: ADD R7.x, PV15.w, -T0.x y: ADD_INT ____, PV15.x, (0x000007E0, 2.825017704e-42f).x z: ADD_INT ____, PV15.x, (0x00000320, 1.121038771e-42f).y w: ADD_INT ____, PV15.x, (0x00000530, 1.860924361e-42f).z t: MUL_e T0.y, R3.y, R3.y 17 x: MUL_e ____, R3.y, PS16 y: ASHR ____, PV16.z, (0x00000002, 2.802596929e-45f).x z: ASHR ____, PV16.w, (0x00000002, 2.802596929e-45f).x w: ASHR T1.w, PV16.y, (0x00000002, 2.802596929e-45f).x t: AND_INT T1.z, (0x7FFFFFFF, 1.#QNANf).y, R3.y 18 x: AND_INT ____, PV17.y, (0x00000003, 4.203895393e-45f).x y: ASHR R0.y, PV17.y, (0x00000002, 2.802596929e-45f).y z: AND_INT T0.z, PV17.z, (0x00000003, 4.203895393e-45f).x w: MUL_e R1.w, T0.z, PV17.x t: ASHR R0.x, PV17.z, (0x00000002, 2.802596929e-45f).y 19 x: ADD_INT R2.x, (0xFFFFFFFD, -1.#QNANf).x, PV18.x y: ADD_INT R1.y, (0xFFFFFFFE, -1.#QNANf).y, PV18.x z: ADD_INT R1.z, -1, PV18.x w: ADD_INT R0.w, -1, PV18.z t: ADD_INT R3.z, (0xFFFFFFFE, -1.#QNANf).y, PV18.z 20 x: ASHR R1.x, T1.w, (0x00000002, 2.802596929e-45f).x y: ADD_INT R2.y, (0xFFFFFFFD, -1.#QNANf).y, T0.z z: AND_INT R2.z, T1.w, (0x00000003, 4.203895393e-45f).z w: SETGT_DX10 R2.w, (0x3D800000, 0.0625f).w, T1.z VEC_021 t: MUL_e R7.z, T0.y, -0.5 06 VTX: ADDR(514) CNT(2) 21 VFETCH R4, R0.y, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 22 VFETCH R5, R0.x, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 07 TEX: ADDR(518) CNT(1) 23 VFETCH R49.x___, R0.z, fc156 MEGA(4) FETCH_TYPE(NO_INDEX_OFFSET) 08 VTX: ADDR(520) CNT(1) 24 VFETCH R6, R1.x, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 09 ALU: ADDR(108) CNT(47) 25 x: CNDE_INT ____, R0.w, R5.y, R5.x y: ADD_INT ____, -1, R2.z z: ADD_INT T0.z, (0xFFFFFFFE, -1.#QNANf).x, R2.z w: CNDE_INT ____, R1.z, R4.y, R4.x VEC_021 t: ADD_INT T1.z, (0xFFFFFFFD, -1.#QNANf).y, R2.z 26 y: CNDE_INT ____, PV25.y, R6.y, R6.x z: CNDE_INT ____, R1.y, R4.z, PV25.w w: CNDE_INT ____, R3.z, R5.z, PV25.x VEC_021 27 x: CNDE_INT ____, T0.z, R6.z, PV26.y y: CNDE_INT ____, R2.x, R4.w, PV26.z z: CNDE_INT ____, R2.y, R5.w, PV26.w VEC_021 w: ADD T0.w, -R1.w, R7.z VEC_021 t: ASHR T0.x, R3.x, (0x00000002, 2.802596929e-45f).x 28 x: MUL_e T1.x, R7.x, PV27.y y: MUL_e T0.y, R7.x, PV27.z z: AND_INT ____, PS27, (0x00000003, 4.203895393e-45f).x w: CNDE_INT T1.w, T1.z, R6.w, PV27.x 29 x: ADD T0.x, PV28.x, PV28.y y: ADD_INT R6.y, -1, PV28.z z: ASHR R6.z, T0.x, (0x00000002, 2.802596929e-45f).x w: ADD_INT R6.w, (0xFFFFFFFE, -1.#QNANf).y, PV28.z t: ADD_INT R4.w, (0xFFFFFFFD, -1.#QNANf).z, PV28.z 30 x: MULADD_e ____, PV29.x, (0x3E800000, 0.25f).y, (0x3EAAAAAB, 0.3333333433f).x y: ADD ____, T0.y, -PV29.x z: MUL_e T1.z, PV29.x, PV29.x w: CNDE_INT T2.w, R2.w, -PV29.x, R7.z t: ADD R2.y, -R3.y, T0.w 31 x: ADD ____, T1.x, PV30.y y: AND_INT R5.y, (0xFFFFF000, -1.#QNANf).x, R49.x w: MULADD_e ____, T0.x, PV30.x, 0.5 VEC_201 32 x: ADD R7.x, R49.x, -PV31.y w: MULADD_e ____, PV31.w, T1.z, PV31.x 33 y: MULADD_e ____, R8.z, (0x3805FDF4, 0.0000319461833f).x, -PV32.w 34 z: ADD ____, PV33.y, T1.w 35 x: CNDE_INT T0.x, R2.w, PV34.z, -R1.w w: ADD R5.w, -T0.x, PV34.z 36 x: CNDE_INT R6.x, R2.w, PV35.w, T0.w 37 z: ADD ____, PV36.x, -T2.w 38 y: ADD R4.y, T0.x, -PV37.z 10 VTX: ADDR(522) CNT(1) 39 VFETCH R1, R6.z, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 11 ALU: ADDR(155) CNT(94) 40 z: CNDE_INT ____, R6.y, R1.y, R1.x 41 y: CNDE_INT ____, R6.w, R1.z, PV40.z 42 x: CNDE_INT ____, R4.w, R1.w, PV41.y 43 w: MULADD_e ____, R8.z, (0x3F317000, 0.6931152344f).x, PV42.x 44 z: ADD ____, R5.w, PV43.w w: CNDE_INT T2.w, R2.w, PV43.w, -R3.y VEC_120 45 x: CNDE_INT T0.x, R2.w, PV44.z, R2.y y: AND_INT R3.y, (0x7FFFFFFF, 1.#QNANf).x, R49.x 46 x: LSHR T1.x, PV45.y, (0x00000017, 3.222986468e-44f).x z: ADD ____, -PV45.x, T2.w w: AND_INT T2.w, (0xFFFFF000, -1.#QNANf).y, PV45.x 47 x: SUB_INT ____, (0x00000096, 2.101947696e-43f).x, PV46.x y: ADD ____, R6.x, PV46.z z: ADD T1.z, T0.x, -PV46.w VEC_120 48 x: ADD ____, R4.y, PV47.y y: AND_INT ____, PV47.x, (0x0000001F, 4.344025239e-44f).x 49 y: ADD T0.y, PV48.x, T1.z w: LSHL ____, 1, PV48.y 50 x: AND_INT ____, R49.x, PV49.w y: ADD_INT T1.y, (0xFFFFFF82, -1.#QNANf).x, T1.x z: MUL_e ____, PV49.y, R7.x VEC_021 w: ADD_INT ____, -1, PV49.w 51 x: SETNE_INT T1.x, PV50.x, 0.0f y: MULADD_e ____, T2.w, R7.x, PV50.z z: AND_INT ____, R49.x, PV50.w w: SETGT_INT T0.w, PV50.y, 0.0f 52 y: SETE_INT ____, PV51.z, 0.0f w: MULADD_e T1.w, T0.y, R5.y, PV51.y 53 x: SETGT_INT T1.x, (0x00000019, 3.503246161e-44f).x, T1.y z: MULADD_e R6.z, R5.y, T2.w, PV52.w w: AND_INT ____, T1.x, PV52.y 54 x: MUL_e ____, PV53.z, (0x42B8AA3B, 92.33248138f).x y: MULADD_e ____, R5.y, T2.w, -PV53.z z: SETE_DX10 T1.z, PV53.z, (0x42B17218, 88.72283936f).y w: AND_INT ____, PV53.w, T0.w VEC_021 t: SETGT_DX10 ____, PV53.z, (0x42B17218, 88.72283936f).y 55 x: AND_INT R61.x, PV54.w, T1.x y: AND_INT T1.y, PS54, 1 z: ADD T2.z, T1.w, PV54.y w: SETE_INT R6.w, R50.x, (0x7F800000, 1.#INFf).x t: F_TO_I R45.x, PV54.x 56 y: SETGT_DX10 ____, PV55.z, (0xB482E308, -0.0000002437957392f).x z: AND_INT ____, PS55, (0x0000003F, 8.828180325e-44f).y w: AND_INT ____, PV55.x, PV55.w t: I_TO_F T0.z, PS55 57 x: LSHL ____, PV56.z, (0x00000002, 2.802596929e-45f).x y: MULADD_e ____, PS56, (0xBC310000, -0.01080322266f).y, R6.z z: AND_INT ____, PV56.y, T1.z w: OR_INT ____, R51.x, R49.x t: AND_INT R60.x, PV56.w, 1 58 x: ADD_INT ____, PV57.x, (0x000009F0, 3.564903293e-42f).x y: ADD_INT ____, PV57.x, (0x00000210, 7.398855892e-43f).y z: AND_INT ____, PV57.z, 1 w: MULADD_e ____, T0.z, (0xB7E42FEF, -0.00002720203884f).z, PV57.y t: AND_INT ____, (0x7FFFFFFF, 1.#QNANf).w, PV57.w 59 x: OR_INT R44.x, PV58.z, T1.y y: ADD T0.y, PV58.w, T2.z z: ASHR ____, PV58.x, (0x00000002, 2.802596929e-45f).x w: ASHR T1.w, PV58.y, (0x00000002, 2.802596929e-45f).x t: SETE_INT R53.x, PS58, 0.0f 60 x: AND_INT ____, PV59.z, (0x00000003, 4.203895393e-45f).x y: MUL_e T1.y, PV59.y, PV59.y z: ASHR R8.z, PV59.z, (0x00000002, 2.802596929e-45f).y w: MULADD_e ____, PV59.y, (0x3D2AAAAB, 0.04166666791f).w, (0x3E2AAAAB, 0.1666666716f).z t: AND_INT T2.z, PV59.w, (0x00000003, 4.203895393e-45f).x 61 x: ADD_INT R7.x, -1, PV60.x y: ADD_INT R4.y, (0xFFFFFFFE, -1.#QNANf).x, PV60.x z: MULADD_e ____, PV60.w, T0.y, 0.5 w: ADD_INT R2.w, (0xFFFFFFFD, -1.#QNANf).y, PV60.x t: ASHR R5.y, T1.w, (0x00000002, 2.802596929e-45f).z 62 x: ADD_INT R6.x, -1, T2.z y: ADD_INT R2.y, (0xFFFFFFFD, -1.#QNANf).x, T2.z z: ADD_INT R1.z, (0xFFFFFFFE, -1.#QNANf).y, T2.z w: MULADD_e R4.w, PV61.z, T1.y, T0.y t: AND_INT R1.w, R53.x, 1 12 VTX: ADDR(524) CNT(2) 63 VFETCH R8, R8.z, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 64 VFETCH R5, R5.y, fc130 MEGA(16) FETCH_TYPE(NO_INDEX_OFFSET) 13 ALU: ADDR(249) CNT(94) 65 x: SETE_INT T1.x, R50.x, 0.0f VEC_201 y: SETGT_UINT T3.y, R3.y, (0x7F800000, 1.#INFf).x z: CNDE_INT ____, R7.x, R8.y, R8.x VEC_021 t: SETGT_UINT R46.x, R50.x, (0x7F800000, 1.#INFf).x 66 x: AND_INT ____, PV65.x, PV65.y y: CNDE_INT ____, R4.y, R8.z, PV65.z z: AND_INT T2.z, PS65, 1 w: CNDE_INT ____, R6.x, R5.y, R5.x VEC_021 t: SETE_INT R59.x, R49.x, R3.y 67 x: AND_INT R42.x, PV66.x, 1 y: AND_INT T0.y, T1.x, PS66 z: CNDE_INT ____, R1.z, R5.z, PV66.w w: CNDE_INT ____, R2.w, R8.w, PV66.y t: AND_INT R58.x, T1.x, 1 68 x: CNDE_INT R4.x, R2.y, R5.w, PV67.z y: SETE_INT T1.y, R51.x, (0x3F800000, 1.0f).x z: MULADD_e ____, PV67.w, R4.w, PV67.w VEC_102 w: SETNE_INT T0.w, R51.x, R50.x t: OR_INT R41.x, R1.w, PV67.x 69 x: MULADD_e R3.x, PV68.x, R4.w, PV68.z y: AND_INT T0.y, T3.y, PV68.y z: SETE_INT T1.z, R3.y, 0.0f VEC_120 w: SETE_INT T1.w, R3.y, (0x7F800000, 1.#INFf).x VEC_120 t: AND_INT R54.x, T0.y, 1 70 x: OR_INT R40.x, R41.x, PS69 y: AND_INT T2.y, R6.w, PV69.z z: SETGT_UINT T0.z, R3.y, (0x4F7FFFFF, 4294967040f).x w: SETE_INT T2.w, R49.x, (0x3F800000, 1.0f).y VEC_120 t: AND_INT R31.x, T0.w, 1 71 x: OR_INT R39.x, PV70.x, R58.x y: AND_INT T0.y, R59.x, PV70.z z: SETE_INT T3.z, R50.x, (0x3F800000, 1.0f).x VEC_201 w: SETGT_UINT T3.w, (0x3F800000, 1.0f).x, R50.x VEC_021 t: AND_INT R30.x, T0.y, 1 72 x: OR_INT R13.x, PV71.x, T2.z y: SETE_INT T1.y, R51.x, R50.x z: AND_INT T2.z, PV71.y, PV71.w w: XOR_INT ____, -1, R59.x VEC_021 t: AND_INT R29.x, T1.y, 1 73 x: OR_INT R12.x, PV72.x, R31.x y: SETGT_UINT R3.y, (0x2E800000, 5.820766091e-11f).x, R3.y z: AND_INT T0.z, T0.z, PV72.w w: AND_INT T1.w, R6.w, PV72.y t: AND_INT R28.x, PS72, T1.w 74 x: OR_INT R11.x, PV73.x, R30.x y: SETGT_DX10 T2.y, (0xC2CE8ED0, -103.2789307f).x, R6.z z: AND_INT ____, R61.x, T0.w t: AND_INT R27.x, T2.y, 1 75 x: OR_INT R10.x, PV74.x, R28.x y: AND_INT R2.y, PV74.z, 1 t: AND_INT R38.x, T1.z, 1 76 x: OR_INT R26.x, PV75.x, R27.x t: AND_INT R25.x, T3.y, 1 77 x: OR_INT R24.x, R29.x, PV76.x t: AND_INT R23.x, T2.w, 1 78 x: OR_INT R22.x, PV77.x, R38.x t: AND_INT R21.x, T2.z, 1 79 x: OR_INT R20.x, PV78.x, R25.x t: AND_INT R37.x, T0.y, 1 80 x: OR_INT R19.x, PV79.x, R23.x t: AND_INT R18.x, PS79, T3.z 81 x: OR_INT R17.x, PV80.x, R21.x t: AND_INT R43.x, T0.z, 1 82 x: OR_INT R16.x, PV81.x, R18.x t: AND_INT R36.x, T3.w, PS81 83 x: OR_INT R15.x, R37.x, PV82.x t: AND_INT R35.x, T3.z, R43.x 84 x: OR_INT R14.x, PV83.x, R36.x t: AND_INT R48.x, T1.w, 1 85 x: OR_INT R34.x, PV84.x, R35.x t: AND_INT R47.x, R6.w, 1 86 x: OR_INT R33.x, R43.x, PV85.x t: AND_INT R52.x, R58.x, T1.y 87 x: OR_INT R32.x, PV86.x, R48.x t: AND_INT R57.x, R3.y, 1 88 x: OR_INT R2.x, R60.x, PV87.x t: AND_INT R56.x, T2.y, 1 89 x: OR_INT R7.x, PV88.x, R47.x t: CNDE_INT R55.x, R2.y, 0.0f, (0x80000000, -0.0f).x 90 x: OR_INT R6.x, PV89.x, R52.x 91 x: OR_INT R5.x, R54.x, PV90.x 92 x: OR_INT R9.x, R58.x, PV91.x 93 x: OR_INT R8.x, PV92.x, R57.x 94 x: OR_INT R1.x, R44.x, PV93.x 95 w: OR_INT ____, PV94.x, R56.x 96 x: AND_INT R0.x, PV95.w, 1 14 MEM_SCRATCH_WRITE_ACK: VEC_PTR[1].x___, R0, ARRAY_SIZE(1) ELEM_SIZE(3) MARK 15 WAIT_ACK: Outstanding_acks <= 0 16 TEX: ADDR(528) CNT(1) 97 RD_SCRATCH R0.x___, VEC_PTR[1], ARRAY_SIZE(1) ELEM_SIZE(3) UNCACHED 17 ALU_PUSH_BEFORE: ADDR(343) CNT(1) 98 x: PREDE_INT ____, R0.x, 0.0f UPDATE_EXEC_MASK UPDATE_PRED 18 JUMP ADDR(20) 19 ALU: ADDR(344) CNT(16) 99 x: LSHL ____, R45.x, (0x00000011, 2.382207389e-44f).x z: ASHR ____, R45.x, (0x00000006, 8.407790786e-45f).y w: ADD T0.w, R4.x, R3.x VEC_120 100 x: SETGT_INT T0.x, (0xFFFFFF83, -1.#QNANf).x, PV99.z y: ADD_INT ____, PV99.z, (0x00000015, 2.942726775e-44f).y w: AND_INT ____, PV99.x, (0xFF800000, -1.#INFf).z 101 x: AND_INT ____, PV100.y, (0x0000001F, 4.344025239e-44f).x z: ADD_INT T0.z, T0.w, PV100.w 102 w: LSHL ____, 1, PV101.x 103 z: MUL_e ____, T0.w, PV102.w 104 y: CNDE_INT ____, T0.x, T0.z, PV103.z 105 x: OR_INT R1.x, R55.x, PV104.y 20 ELSE POP_CNT(1) ADDR(23) 21 ALU: ADDR(360) CNT(122) 106 x: AND_INT R1.x, R1.x, 1 y: AND_INT R0.y, R44.x, 1 VEC_120 w: AND_INT T1.w, R2.x, 1 VEC_201 107 x: AND_INT R0.x, R5.x, 1 y: AND_INT T2.y, R6.x, 1 VEC_120 z: AND_INT T2.z, R7.x, 1 VEC_201 108 y: AND_INT ____, R13.x, 1 z: AND_INT R0.z, R8.x, 1 VEC_120 w: AND_INT T3.w, R9.x, 1 VEC_201 109 x: AND_INT ____, R12.x, 1 y: XOR_INT ____, -1, PV108.y z: AND_INT T0.z, R10.x, 1 VEC_120 w: AND_INT ____, R11.x, 1 VEC_201 110 x: AND_INT T0.x, R24.x, 1 y: AND_INT ____, R26.x, 1 VEC_120 z: XOR_INT ____, -1, PV109.x w: AND_INT T2.w, R31.x, PV109.y VEC_201 t: XOR_INT ____, -1, PV109.w 111 x: AND_INT T3.x, R30.x, PV110.z y: AND_INT T3.y, R28.x, PS110 VEC_120 z: XOR_INT ____, -1, T0.z w: AND_INT ____, R22.x, 1 VEC_201 t: XOR_INT ____, -1, PV110.y 112 x: AND_INT ____, R20.x, 1 y: XOR_INT ____, -1, PV111.w z: AND_INT T3.z, R27.x, PV111.z VEC_120 w: AND_INT R0.w, R29.x, PS111 VEC_201 113 x: AND_INT ____, R19.x, 1 y: AND_INT R1.y, R25.x, PV112.y VEC_120 z: AND_INT ____, R17.x, 1 VEC_201 w: XOR_INT ____, -1, PV112.x 114 x: AND_INT ____, R15.x, 1 y: XOR_INT ____, -1, PV113.x z: AND_INT R1.z, R23.x, PV113.w VEC_120 w: AND_INT ____, R16.x, 1 VEC_201 t: XOR_INT ____, -1, PV113.z 115 x: AND_INT R3.x, R18.x, PS114 y: AND_INT ____, R14.x, 1 VEC_120 z: XOR_INT ____, -1, PV114.w w: AND_INT R1.w, R21.x, PV114.y VEC_201 t: XOR_INT ____, -1, PV114.x 116 x: AND_INT T1.x, R34.x, 1 y: AND_INT R2.y, R37.x, PV115.z VEC_120 z: AND_INT R2.z, R36.x, PS115 VEC_201 w: XOR_INT ____, -1, PV115.y 117 x: AND_INT T2.x, R33.x, 1 z: AND_INT T0.z, R32.x, 1 VEC_120 w: AND_INT R2.w, R35.x, PV116.w VEC_201 118 y: AND_INT ____, R42.x, (0x000000FF, 3.573311084e-43f).x z: AND_INT ____, R40.x, (0x000000FF, 3.573311084e-43f).x VEC_120 w: AND_INT ____, R41.x, (0x000000FF, 3.573311084e-43f).x VEC_201 119 x: XOR_INT ____, -1, PV118.y y: CNDE_INT T0.y, PV118.y, 0.0f, R49.x VEC_021 z: XOR_INT ____, -1, PV118.w w: XOR_INT ____, -1, PV118.z t: AND_INT ____, R39.x, (0x000000FF, 3.573311084e-43f).x 120 x: OR_INT T0.x, (0xFFFFFFFE, -1.#QNANf).x, PV119.w y: OR_INT ____, (0xFFFFFFFE, -1.#QNANf).x, PV119.z z: XOR_INT ____, -1, PS119 w: AND_INT ____, R53.x, PV119.x t: XOR_INT ____, -1, T0.x 121 x: AND_INT R2.x, R38.x, PS120 y: AND_INT T1.y, R54.x, PV120.y VEC_120 z: AND_INT ____, PV120.w, 1 w: AND_INT ____, R46.x, PV120.z VEC_201 122 x: CNDE_INT ____, PV121.z, T0.y, (0x7FC00000, 1.#QNANf).x y: XOR_INT ____, -1, T1.x z: AND_INT T1.z, PV121.w, 1 w: AND_INT T0.w, R58.x, T0.x VEC_021 123 x: AND_INT R4.x, R43.x, PV122.y y: CNDE_INT R3.y, R59.x, (0x80000000, -0.0f).x, R51.x VEC_102 z: CNDE_INT R3.z, R59.x, 0.0f, R51.x VEC_102 w: CNDE_INT ____, T1.y, PV122.x, 0.0f t: XOR_INT ____, -1, T0.z 124 x: AND_INT R6.x, R60.x, PS123 y: XOR_INT ____, -1, T2.x z: CNDE_INT ____, T0.w, PV123.w, (0x7F800000, 1.#INFf).x w: XOR_INT ____, -1, T1.w 125 x: AND_INT R5.x, R48.x, PV124.y y: CNDE_INT ____, T1.z, PV124.z, R51.x z: AND_INT R4.z, R47.x, PV124.w VEC_120 w: XOR_INT ____, -1, T2.z 126 x: AND_INT R7.x, R52.x, PV125.w y: CNDE_INT R4.y, R59.x, 0.0f, R50.x VEC_102 z: CNDE_INT ____, T2.w, PV125.y, (0x7FC00000, 1.#QNANf).x w: XOR_INT ____, -1, T2.y 127 x: AND_INT R8.x, R54.x, PV126.w w: CNDE_INT ____, T3.x, PV126.z, R49.x VEC_102 128 x: CNDE_INT ____, T3.y, PV127.w, (0x7FC00000, 1.#QNANf).x y: AND_INT ____, R61.x, 1 z: CNDE_INT R5.z, R59.x, (0x7F800000, 1.#INFf).y, R51.x VEC_102 w: XOR_INT ____, -1, T3.w 129 x: AND_INT R0.x, R57.x, PV128.w y: CNDE_INT ____, T3.z, PV128.x, (0x7FC00000, 1.#QNANf).x z: CNDE_INT R6.z, PV128.y, (0x80000000, -0.0f).y, R51.x w: CNDE_INT R3.w, PV128.y, (0x7F800000, 1.#INFf).w, (0xFF800000, -1.#INFf).z t: XOR_INT ____, -1, R0.x 130 x: OR_INT R1.x, R55.x, (0x7F800000, 1.#INFf).x y: XOR_INT ____, -1, R0.z z: CNDE_INT ____, R0.w, PV129.y, (0x3F800000, 1.0f).y w: AND_INT R0.w, R58.x, PS129 VEC_201 t: XOR_INT ____, -1, R1.x 131 x: AND_INT R2.x, R0.y, PV130.y y: AND_INT R0.y, R56.x, PS130 w: CNDE_INT ____, R2.x, PV130.z, (0x3F800000, 1.0f).x VEC_120 132 x: CNDE_INT ____, R1.y, PV131.w, R49.x 133 y: CNDE_INT ____, R1.z, PV132.x, R51.x 134 z: CNDE_INT ____, R1.w, PV133.y, 0.0f 135 w: CNDE_INT R1.w, R3.x, PV134.z, (0x3F800000, 1.0f).x 22 ALU_POP_AFTER: ADDR(482) CNT(17) 136 x: CNDE_INT ____, R2.y, R1.w, (0x7F800000, 1.#INFf).x 137 y: CNDE_INT ____, R2.z, PV136.x, (0x7F800000, 1.#INFf).x 138 z: CNDE_INT ____, R2.w, PV137.y, (0x3F800000, 1.0f).x 139 w: CNDE_INT ____, R4.x, PV138.z, 0.0f 140 w: CNDE_INT ____, R5.x, PV139.w, R3.z 141 w: CNDE_INT ____, R6.x, PV140.w, R3.y 142 w: CNDE_INT ____, R4.z, PV141.w, R4.y 143 w: CNDE_INT ____, R7.x, PV142.w, R5.z 144 z: CNDE_INT ____, R8.x, PV143.w, R6.z 145 z: CNDE_INT ____, R0.w, PV144.z, R3.w 146 w: CNDE_INT ____, R0.x, PV145.z, (0x3F800000, 1.0f).x 147 w: CNDE_INT ____, R2.x, PV146.w, R1.x 148 x: CNDE_INT R1.x, R0.y, PV147.w, R55.x 23 TEX: ADDR(530) CNT(1) 149 RD_SCRATCH R0.x___, VEC_PTR[0], ARRAY_SIZE(1) ELEM_SIZE(3) UNCACHED 24 ALU: ADDR(499) CNT(3) KCACHE0(CB1:0-15) 150 y: ADD_INT ____, KC0[1].x, R0.x 151 x: LSHR R0.x, PV150.y, (0x00000002, 2.802596929e-45f).x 25 MEM_RAT_CACHELESS_STORE_RAW: RAT(1)[R0].x___, R1, ARRAY_SIZE(4) MARK VPM END_OF_PROGRAM