A number of questions on 6900 gpu streamcore capabilities
Good Afternoon,
In the AMD Acceleratd Parallel Processing OpenCL Programming Guide
at page 119 section 4.13.1 table 4.14 it shoes at integer inst rates
a total throughput of 1 mul for each 5 PE's at Cypress.
Now i'm interested in this same table for Cayman and most specifically i'm interested in the aggregated throughput of mulhi + mul each cycle at a streamcore.
This as i received contradictary information there. It was my understanding it is possible to schedule 2 mul's per cycle per streamcore at Cayman. Is that correct?
If not, is the aggregated number of mul+mulhi maybe 2 then for the 6900 series?
This as this is a huge difference for multiplication code, namely 32 bits output per cycle per streamcore versus 64 bits output per cycle per streamcore.
Regards,
Vincent