Maximum DP floating point throughput without -cl-mad-enable option

Question asked by ekondis on Jan 23, 2016
I'm doing some tests with a kernel that makes intensive use of multiply-add operations on double precision using a GCN GPU (i.e. R9-380X). The operations are not translated to multiply-add instructions but rather as separate multiplication and addition instructions. When the kernel is built using the -cl-mad-enable option the generated instructions are multiply-additions as intended in the first place. Why doesn't the compiler use multiply-addition instructions without using the aforementioned option? Isn't the multiply-addition instruction compliant with the IEEE-754 standard?