5 Replies Latest reply on Jan 18, 2010 3:41 AM by bpurnomo

    Strange Results

    ryta1203

      Kernel 1:

      ALU: 68

      TEX: 3

      CF: 7

      Reported ALU:Fetch = 2.92

      Kernel 2:

      ALU: 56

      TEX: 3

      CF: 5

      Reported ALU:Fetch = 4.67

      There are no loops in either kernel.

      Kernel 1 is running faster, does this make any sense at all? It doesn't to me, why is this? The only thing I can think of is that they are FETCH bound!?

      Both kernels run in about the same time, so that's my conclusion.

      MY QUESTION IS THIS: Why is the SKA reporting seemingly incorrect ALU:Fetch ratios? I could see this if the Bottleneck was FETCH or Global Read/Write, but it reports it as ALU (even though it's obviuosly not).

      I'm just curious why the strange ALU:Fetch ratio reporting? Any ideas anyone?

        • Strange Results
          eduardoschardong

          Ahn... Could you post the Kernel?

           

          Maybe effect of branching or memory banks (channels?) conflicts...

           

            • Strange Results
              ryta1203

              Yes, there is branching in kernel 1..

              ..but like I said, the kernels are fetch bound, I just want to know why the reported ALU:Fetch ratios are the way they are?

                • Strange Results
                  rahulgarg

                  Can be for dozens of reasons. Hard to say. Here are 2 that come to mind.

                  1. How many GPRs in each case? Maybe the GPR usage is high in the second case and so not enough threads running to hide fetch latency.

                  2. Or maybe the second kernel isnt cache friendly.

                   

                   

                    • Strange Results
                      ryta1203

                       

                      Originally posted by: rahulgarg Can be for dozens of reasons. Hard to say. Here are 2 that come to mind.

                      1. How many GPRs in each case? Maybe the GPR usage is high in the second case and so not enough threads running to hide fetch latency.

                      2. Or maybe the second kernel isnt cache friendly.

                      I appreciate the feedback, but as I've stated I already know why kernel 1 runs faster/same speed. The kernels are fetch bound, but what I want to know is how the SKA came up with those ALU:Fetch ratios given those values for ALU and TEX instructions?

                       

                       

                    • Strange Results
                      bpurnomo

                       

                      Originally posted by: ryta1203 Yes, there is branching in kernel 1..

                       

                      SKA takes branching into account when calculating ALU:Tex ratio.