Does the ALU of HD 5870 support native 64 bit calculations? Does the following instructions use the same cycles?

long int x, y, z;

z = x + y;

int a, b, c;

c = a + b;

Thank you in advance!

Does the ALU of HD 5870 support native 64 bit calculations? Does the following instructions use the same cycles?

long int x, y, z;

z = x + y;

int a, b, c;

c = a + b;

Thank you in advance!

- 1 person found this helpful
Does the ALU of HD 5870 support native 64 bit calculations?

It is not native 64 bit, but can do 64 bit calculations.

Does the following instructions use the same cycles?

AFAIK Double Precision Operations need 5 times more cycles than Single Precision Operations.

--

Srdja

- 1 person found this helpful
I believe a 64-bit integer add/subtract is emulated using 32-bit carry-add/borrow-subtract, so those long operations should take twice as many cycles as those int operations. In case you're wondering, 64-bit integer multiplication should take around four times as many cycles as 32-bit integer multiplication, and division would be even more expensive, all assuming emulation using basic 32-bit operations.

Although add/subtract run at the full rate, multiply isn't so lucky.

The programming guide, section 4.13.1 has a bit about data type performance - note that these things aren't even really '32 bit' integer devices either, and a 32 bit multiply has 1/5 the rate of a 24 bit or float multiply.

Presumably all long ops are implemented using 32 bit ones.

Just because an instruction takes longer, or runs at a different rate, doesn't mean it's not a native instruction as even instructions for an x86 CPU have variable execution time. 32-bit integer multiply is a native instruction in the GPU, the difference, compared to 24-bit multiply, is that not all instruction slots are able to execute the instruction, hence instruction throughput is reduced.

Thank you notzed! It is out of my expection that 24 bit multiplication is 5 times faster than 32 bit! I even wonder why OpenCL support the strange 24 bit format

hmm, so e-mail replying doesn't seem to work at all. blah.

I just stated that the 24-bit multiply uses the single-floating point mantissa multipliers, which are only 24-bit. So it makes sense when the cards are trying to pack as much float performance in as possible. It's still a useful size for many integer algorithms, and particularly address calculations.

Although add/subtract run at the full rate, multiply isn't so lucky.

The programming guide, section 4.13.1 has a bit about data type performance - note that these things aren't even really '32 bit' integer devices either, and a 32 bit multiply has 1/5 the rate of a 24 bit or float multiply.

Presumably all long ops are implemented using 32 bit ones.