I was jus thinking: Why do I, the programmer, have to care about things like zero-copy memory buffers?
The APU / compiler / driver should be smart enough to optimize memory transfers for me.
My opinion :
You should consider that the underlying architecture might not be an APU but CPU+GPU hence one should be able to hide memory transfer latency some other way ( through "stream" like processing for instance ). I don't think some things should be hidden.
Retrieving data ...