yes MAD is counted as two operations. one cycle is executed in four clock as it executes 64 wide wavefront in four clocks.
Part of the ever-existing confusion is that the terminology is dependent on context. When talking about the Southern Islands / GCNs, the 7970 has 32 Compute Units (CUs), each CU has 4 SIMDs, and each SIMD having 16 Processing Elements (ALUs). So simple.. with a FMA counting as 2 ops/cycle, that gives you 4096 flops. But some of the documentation in the Southern Islands docs (like the table on 5-23) is more structured as if it was still talking about the Evergreen and Northern Islands chips where the Compute Unit is described as 16 Stream Processors (SP) and the SP does 4 ops per cycle.