1 Reply Latest reply on Oct 13, 2011 8:20 PM by corry

    Switch statement fallthrough behavior


      According to the docs, I should be able to have

      switch r0.x
      case 0
      case 1
      case 2
      case 3
      mcall(0), (r32), (r32)
      case 4
      case 5
      case 6
      case 7
      mcall(0), (r33), (r33)
      case n
      case n+1
      case n+2
      case n+3
      mcall(0), (r32+n), (r32+n)

      indeed this compiles just fine...but following that as an example, why, oh why does it generate n case blocks, when n/4 covers it?  A jump table/4 is just as easy as a jump table/1...

      I'm not sure if this is the reason for my super poor performance or not, but this is certainly a code size easy win, which will certainly improve instruction locality....for now, I suppose I can add a restriction that n has to be n mod 4 == 0, or at least padded so it effects nothing, but this is pretty stupid...I just wonder if I leave the empty cases out, will it still construct a jump table....off to find out!


        • Switch statement fallthrough behavior

          Yup, that was the cause of the problem.  With all the intermediate cases, I saw a 33% decrease in performance.  Without the intermediate cases, on the surface, I saw a 10% increase from when I had the scratch registers, but now I'm using so few registers, I think I can cram another wavefront onto the GPU....

          Honestly, that should NOT have that level of impact, 40% is huge for something that should have 0 impact, and honestly should be an easy one to fix!