Hm. I should try to dig up the K8.
We're trying to do some research - trying to find performance metrics that are microarchitecture independent so that we can profile an app on one computer, and it would be transferable to another computer with heterogeneous cores.
So knowing the pipeline depth may help us remove/alleviate the architecture's influence on flushes/stalls. e.g. the stall due to branch misprediction is dependent on the microarchitecture. Chances are the chip would pay higher penalty with deeper pipeline than chips that have lower pipeline depth. Well this example is kinda contrived, but you never really know what info could be handy. # of branches mispredicted would be dependent on your branch predictors - it could be the BTB or the Global History Bimodal Counters that mispredicted, and different chips have different stats on these things.
So yeah, I've been benchmarking and poking at performance counters, and we need to come up with good ways to abstract out these counter's dependency on microarchitecture.