I have been using GPUPerfAPI-2.7 lately and have profiled my application on Radeon 5870. The numbers for the WriteInsts , FetchInsts, and ALUinsts are the same.
To verify the doubt I have tried it accross the samples of the AMD SDK. They values reported there are the same too.
Is something more to be done than what is in the User guide.