We are developing a integer-heavy kernel which relies on multiplication quite heavily.
We currently allow for carries, etc, using a 28-bit wide integer inside of the 32-bit registers. Looking to the 58XX series, our question is if it is possible to use the 24-bit multiplications available in the cores. I'm happy to "post-process" the 32-bit multiplication calls to 24-bit equivalents.
Is the 5870 assembly produced by stream kernel analyzer the same version run by a kernel execution?
Any other suggestions?