How does the GCN architecture handle integer operations? Do the vector ALUs have an integer path as well as float, or is it all done on that single scalar thing that each compute unit apparently has? The reason I'm asking is I saw worse performance for an integer workload on a 7970 compared to a GeForce GTX 580 - the 7970 takes over twice as long to run the kernel. IIRC the Fermi architecture has integer & float paths in its main ALUs.

Hi,

On GCN most integer operations (and, or, shl, add, addc, cmp, ...) works as fast as the single precision float operations.

32bit integer multiply works on double precision rate which is 1/4 single precision rate on higher models (like 79xx).

There's a special 24bit MAD instruction which works on SP rate.

IMO that performance difference you mentioned is not because the lack of integer performance but because of the GCN has some extra needs compared to previous architectures:

In general it needs 4x more threads, and the optimal register limit is dropped from 128 down to 84 or even better: 64 regs, in order to get close to nominal performance.

What was that test you've tried btw?