They 'configure' it when they develop a specific GCN chip. On HD7990 then DP/SP ratio is 1/4. There is a 24 bit multiplier circuit in every stream core for the single precision float math, so the 32bit integer multiplication will be slow as double precision math.
You can optimize your program with using 24bit integer mul/mad instead of 32bit int mul/mad in places where 24 bit precision is enough. Try the mul24() or the mad24() instructions!
1 of 1 people found this helpful
I would add that the performance hit is just for mult, div, and similar calcs -- all basic integer ops (bitwise calcs, comparison, add, sub, etc) are done in single ticks. Avoid division at all costs -- there is no idiv command in GCN assembly.... the compiler will convert to float, do the float reciprocal, do a float mult, then convert back. The isa file is best place to look -- optimal is to have SIMDs fed with no memory latency (that is, all data in registers) and the SIMDs keeping the PEs fed (if possible, the code is in blocks of 16 of the same commands operating on separate registers).