Aggregated Throughput at 6900 of mul + mulhi

A number of questions on 6900 gpu streamcore capabilities

Good Afternoon,


In the AMD Acceleratd Parallel Processing OpenCL Programming Guide 

at page 119 section 4.13.1 table 4.14 it shoes at integer inst rates

a total throughput of 1 mul for each 5 PE's at Cypress.

Now i'm interested in this same table for Cayman and most specifically i'm interested in the aggregated throughput of mulhi + mul each cycle at a streamcore.


This as i received contradictary information there. It was my understanding it is possible to schedule 2 mul's per cycle per streamcore at Cayman. Is that correct?


If not, is the aggregated number of mul+mulhi maybe 2 then for the 6900 series?


This as this is a huge difference for multiplication code, namely 32 bits output per cycle per streamcore versus 64 bits output per cycle per streamcore.