Hello AMD

I have replaced all a*b+c integer calculations with mad24 in my code (and I have quite a few of those!). Unfortunately I see no runtime difference at all.

Questions:

1) Does your OpenCL compiler automatically replace all a*b+c with mad24 ?

2) Does your OpenCL implementation support mad24 - or does the compiled code silently replace all mad24 with a*b+c ?

Thanks for your reply Micah.

The code runs on a HD6970. Yes, it is memory bound - but since I have lots of those a*b+c I thought I might see at least *some* difference. No?

On the 5800 series, signed mul24(a,b) is turned into (((a<<8)>>8)*((b<<8)>>8)). This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:

signed a * b: 0.9736s

unsigned mul24(a,b): 0.9734s

signed mul24(a,b):

2.2771sTo AMD:

I don't think mul24 should EVER be slower than 32-bit multiplication. The OpenCL spec says that if the inputs are don't fit into a 24-bit number the answer is implementation-defined. Doing all those bit-shifts will only affect the inputs if they are in the "implementation-defined" range, where the answer can't be relied on anyway. If the inputs already fit into a 24-bit number the shifts will not affect the result and therefore simply waste time.