8 Replies Latest reply on Sep 8, 2010 3:07 PM by himanshu.gautam

    Wildly variable kernel performance

    tractatus
      Seemingly minor kernel change results in 6X kernel slowdown

      I have a kernel that with a seemingly minor code change, runs 6X slower. The original code is something like:

      kernel method1(int *A, int*B, int flag) {

         int *locX;

         int *locY;

         if (flag) {

            locX = A;

            locY = B;

         }

         else {

            locX = B;

            locY = A;

         }

         major loop over locX and locY

      }

       

         I tried changing this to take to flag outside of the kernel, and it runs 6X slower:

      kernel method2(int *A, int *B) {

         int *locX;

         int *locY;

         locX = A;

         locY = B;

       

         major loop over locX and locY;

      }

       

         and I just vary the call of method2 on the host as,

         if (flag) {

            method2(A, B);

         }

         else {

            method2(B, A);

         }

       

         Not sure why this should affect the performance in a negative direction, and this was the last conditional, and with method2, there are now no conditionals. I am running this on Linux x86_64 using an ATI 5870, and I don't know of any profiler tools that would let me see what is going on. I have tried all sorts of global counts: 1280, 2560, 10240, 20480, 40960, with item counts of 64 and 256, 10240 global and 256 item count appears to work best.

         Any insight into the factors affecting performance between method1 and method2, or tools available for Linux x86_64 would be great.