Does the ALU of HD 5870 support native 64 bit calculations? Does the following instructions use the same cycles?
long int x, y, z;
z = x + y;
int a, b, c;
c = a + b;
Thank you in advance!
Although add/subtract run at the full rate, multiply isn't so lucky.
The programming guide, section 4.13.1 has a bit about data type performance - note that these things aren't even really '32 bit' integer devices either, and a 32 bit multiply has 1/5 the rate of a 24 bit or float multiply.
Presumably all long ops are implemented using 32 bit ones.
Does the ALU of HD 5870 support native 64 bit calculations?
It is not native 64 bit, but can do 64 bit calculations.
Does the following instructions use the same cycles?
AFAIK Double Precision Operations need 5 times more cycles than Single Precision Operations.
I believe a 64-bit integer add/subtract is emulated using 32-bit carry-add/borrow-subtract, so those long operations should take twice as many cycles as those int operations. In case you're wondering, 64-bit integer multiplication should take around four times as many cycles as 32-bit integer multiplication, and division would be even more expensive, all assuming emulation using basic 32-bit operations.
Thank you settle! Very helpful answer!
I have to use a simple 64-bit calculation. I think an on-chip emulation is more efficient than a home-made emulation in C-language
thank you smatovic for your helpful answer!
Just because an instruction takes longer, or runs at a different rate, doesn't mean it's not a native instruction as even instructions for an x86 CPU have variable execution time. 32-bit integer multiply is a native instruction in the GPU, the difference, compared to 24-bit multiply, is that not all instruction slots are able to execute the instruction, hence instruction throughput is reduced.
Thank you Jeff. I agree. The boundary between hardware an software has been obscured. In my mind, however, a real 64-bit processor should complete a 64-bit addition in one step
Thank you notzed! It is out of my expection that 24 bit multiplication is 5 times faster than 32 bit! I even wonder why OpenCL support the strange 24 bit format
hmm, so e-mail replying doesn't seem to work at all. blah.
I just stated that the 24-bit multiply uses the single-floating point mantissa multipliers, which are only 24-bit. So it makes sense when the cards are trying to pack as much float performance in as possible. It's still a useful size for many integer algorithms, and particularly address calculations.
Thank you notzed. You just brought 24-bit numbers to my attention
Retrieving data ...