Archives Discussions

viscocoa · ‎03-17-2012

Does the ALU of HD 5870 support native 64 bit calculations? Does the following instructions use the same cycles?

long int x, y, z;

z = x + y;

int a, b, c;

c = a + b;

Thank you in advance!

notzed · ‎03-18-2012

Although add/subtract run at the full rate, multiply isn't so lucky.

The programming guide, section 4.13.1 has a bit about data type performance - note that these things aren't even really '32 bit' integer devices either, and a 32 bit multiply has 1/5 the rate of a 24 bit or float multiply.

Presumably all long ops are implemented using 32 bit ones.

View solution in original post

smatovic · ‎03-18-2012

Does the ALU of HD 5870 support native 64 bit calculations?

It is not native 64 bit, but can do 64 bit calculations.

Does the following instructions use the same cycles?

AFAIK Double Precision Operations need 5 times more cycles than Single Precision Operations.

--

Srdja

settle · ‎03-18-2012

I believe a 64-bit integer add/subtract is emulated using 32-bit carry-add/borrow-subtract, so those long operations should take twice as many cycles as those int operations. In case you're wondering, 64-bit integer multiplication should take around four times as many cycles as 32-bit integer multiplication, and division would be even more expensive, all assuming emulation using basic 32-bit operations.

viscocoa · ‎03-22-2012

Thank you settle! Very helpful answer!

I have to use a simple 64-bit calculation. I think an on-chip emulation is more efficient than a home-made emulation in C-language

viscocoa · ‎03-22-2012

thank you smatovic for your helpful answer!

notzed · ‎03-18-2012

Although add/subtract run at the full rate, multiply isn't so lucky.

The programming guide, section 4.13.1 has a bit about data type performance - note that these things aren't even really '32 bit' integer devices either, and a 32 bit multiply has 1/5 the rate of a 24 bit or float multiply.

Presumably all long ops are implemented using 32 bit ones.

jeff_golds · ‎03-19-2012

Just because an instruction takes longer, or runs at a different rate, doesn't mean it's not a native instruction as even instructions for an x86 CPU have variable execution time. 32-bit integer multiply is a native instruction in the GPU, the difference, compared to 24-bit multiply, is that not all instruction slots are able to execute the instruction, hence instruction throughput is reduced.

viscocoa · ‎03-22-2012

Thank you Jeff. I agree. The boundary between hardware an software has been obscured. In my mind, however, a real 64-bit processor should complete a 64-bit addition in one step

viscocoa · ‎03-22-2012

Thank you notzed! It is out of my expection that 24 bit multiplication is 5 times faster than 32 bit! I even wonder why OpenCL support the strange 24 bit format

notzed · ‎03-22-2012

notzed · ‎03-22-2012

hmm, so e-mail replying doesn't seem to work at all. blah.

I just stated that the 24-bit multiply uses the single-floating point mantissa multipliers, which are only 24-bit. So it makes sense when the cards are trying to pack as much float performance in as possible. It's still a useful size for many integer algorithms, and particularly address calculations.

viscocoa · ‎03-23-2012

Thank you notzed. You just brought 24-bit numbers to my attention

Archives Discussions

Is HD 5870 genuine 64 bit?