Where can I find a detailed description on Opteron's memory consistency model, including read/write combinations across multiple pages and 'home' memory controllers. As far as I understand the coherency model (MOESI), writes are effectively serialized to a given cache line between multiple cores, but I don't understand what allows shared memory objects larger than 64 aligned bytes to behave properly (unless barriers are thrown around almost everywhere). If I have cores A-D on separate sockets where A writes to pages hosted in RAM off each, and B wants to read a consistent image of the aggregate output, how is this guaranteed? For example, is it strictly necessary to used locked instructions/memory fences for shared ring buffers? Would anything like a copy-on-write tree be consistently viewable without forced queue flushes across an entire system? If at all possible, I'd like an answer more substantial than 'it just works if you do X.' Thanks for any help!