I am doing a lot of hand-coded assembler optimization on all AMD CPU generations. I would like to share ideas with other interested people.
Especially perplexing is a lack of precise details of the pipeline operation. I don't want anything that is an AMD secret, but if any of you guys have figured some things out, or you know of some published reports, or patent applications, I'd like to hear about them.
In return, I have developed some very good and accurate code timing methods, using both IP sampling and RDTSC methods. I'd be happy to share them with anyone.