According to Evergreen ISA document and various presentation by AMD, Evergreen family of GPUs can issue dependent muls in the same ALU slot like The following example taken from the ISA document
w z y x
mul mul mul_prev mul_prev
Now I try to write kernels that have dependent MUL and it always the case that the ISA generated has only one MUL per slot (20 % utilization). I even traied doing that using CAL/IL and getting the same result. Looks like compiler issue that may have significant performance if resolved. Any plan on doing that soon?
Also (and this is a repeat from previous posts that was never answered) no integer mul24 or muladd24, even IL does not have these instruction (used to have them though but no longer in ATI stream SDK 2.0). These instructions are very helpful. Do you have any plans on exposing these instructions on IL?