On ATI cards all ALU SP operations take 1 cycle. DP precision is 1/4 of SP speed. But when you compare to CPU you should know that GPU supports only basic DP operations:
mad, mul, div*, add, sub, SP->DP conversion, DP->SP conversion, ldexp, frexp
*GPU has implemented reciprocal function with reduced accuracy. So for true 1/x ( or division ) you must add few mads.
More advanced function like sin, exp, cos, log .... must be computed using those basic ops. And in this case CPU has huge advantage. You can look into CAL++ sources for implementation of few of those functions.