Suppose you have a single socket system (desktop or server is irrelevant), with a 4 or 6 core CPU. I'm trying to understand how different cores on the CPU will share/provide cache lines to one another.
Suppose a clean cache line, L, is only held in the L1D cache for core0 in the CPU, and not in the L3.
If core1 tries to read data from L, it will miss in it's L1D and L2 caches. It will probe the L3 cache, and should probe the other L1D and L2 caches in the CPU (to see if the line is present and modified).
Who will satisfy core1's request?
1. It could get the data from the memory controller, which would be slow.
2. It could get the data from the L1D cache for core0, which should be faster.
It seems like #2 would make the most sense. Can someone from AMD explain which is the case? It's possible that this is not a yes/no question and is more complex, and if so, I'd like to understand what will determine the answer (e.g. would the probe filter change behavior at all).
Added: I just checked the BIOS and kernel dev guide and it appears that there is a performance counter "EventSelect 043h Data Cache Refills from the Northbridge" and the unit mask selects which of the 5 states the line is in...this seems to imply that a clean line could be filled from another L1D on the same die, but it'd be nice to have this verified.