I want to know how prefetch is implemented on the Phenom in particular, but also for other processor models.
In particular, (1) how many streams can be active, (2) automatic hardware prefetch and how to control it, (3) what happens if address not in TLB and/or page table.
The embedded memory controller is supposed to have a data sheet, but I can't find one. A pointer to that would also be appreciated.
Not sure if the Software Optimization Guide contains what you're specifically looking for, but it might be a good place to start.
Otherwise, perhaps one of the architecture programming manuals can help.