According to the docs, I should be able to have
switch r0.x
case 0
case 1
case 2
case 3
mcall(0), (r32), (r32)
break
case 4
case 5
case 6
case 7
mcall(0), (r33), (r33)
break
.
.
.
.
case n
case n+1
case n+2
case n+3
mcall(0), (r32+n), (r32+n)
break
endswitch
indeed this compiles just fine...but following that as an example, why, oh why does it generate n case blocks, when n/4 covers it? A jump table/4 is just as easy as a jump table/1...
I'm not sure if this is the reason for my super poor performance or not, but this is certainly a code size easy win, which will certainly improve instruction locality....for now, I suppose I can add a restriction that n has to be n mod 4 == 0, or at least padded so it effects nothing, but this is pretty stupid...I just wonder if I leave the empty cases out, will it still construct a jump table....off to find out!