Here's what I got from one of our Engineers:
Try using performance counter 0xEB (Sized Commands) to quantify the events you measured with 0xEC. Be aware though, that Sized Commands occur from CPU requests and IO device requests (hence the qualifier). Note that there are Unit Masks in performance counter 0xEB, for counting Sized-Write (dword) commands and for counting Sized-Write (byte) commands. Using both Byte and Dword, masks would give some indication of the amount of DMA data moved (after subtracting requests from CPU(s)). Another way to indicate the amount of data DMA’ed (outgoing to the device), uses the HyperTransport (HT) Transmit Bandwidth Event Counter(s) specific to the non-coherent HT link(s) in the system (Events 0xF6, 0xF7, 0xF8, for HT0, HT1, HT2 respectively). Unit Mask 0x2 selects counting the number of dwords being sent to the DMA devices.
Hope this helps!