cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

FrodoTheGiant
Journeyman III

mad24 - why no runtime difference?

Hello AMD

 

I have replaced all a*b+c integer calculations with mad24 in my code (and I have quite a few of those!). Unfortunately I see no runtime difference at all.

Questions:

1) Does your OpenCL compiler automatically replace all a*b+c with mad24 ?

2) Does your OpenCL implementation support mad24 - or does the compiled code silently replace all mad24 with a*b+c ?

0 Likes
4 Replies

This optimization occurs based on the hardware support for the mad24 operation. Not all hardware supports both signed and unsigned version. Also, if your program is not bound by ALU, you might not see any performance gain.
0 Likes

Thanks for your reply Micah.

 

The code runs on a HD6970. Yes, it is memory bound - but since I have lots of those a*b+c I thought I might see at least *some* difference. No?

0 Likes

This is not directly related since you have a 6970, but it should be noted anyway.

On the 5800 series, signed mul24(a,b) is turned into (((a<<8)>>8)*((b<<8)>>8)). This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:

signed a * b: 0.9736s
unsigned mul24(a,b): 0.9734s
signed mul24(a,b): 2.2771s

To AMD:
I don't think mul24 should EVER be slower than 32-bit multiplication. The OpenCL spec says that if the inputs are don't fit into a 24-bit number the answer is implementation-defined. Doing all those bit-shifts will only affect the inputs if they are in the "implementation-defined" range, where the answer can't be relied on anyway. If the inputs already fit into a 24-bit number the shifts will not affect the result and therefore simply waste time.
0 Likes

If your code is memory bound, then your ALU is basically free.
0 Likes