Hi,
the GPUs as well as the AMD-IL have an instruction for mul24_hi (unsigned as an example, but exists also for signed):
Evergreen/Cayman: MULHI_UINT24
GCN: V_MUL_HI_U32_U24
IL: UMUL24_high ( IL_OP_U_MUL24_HIGH )
However, in OpenCL I'm still missing this. If it is so difficult to get this into the OpenCL standard, could you not add an AMD-specific extension to add this important performance-feature?
I found this old thread about it, but it still seems to be unresolved
http://devgurus.amd.com/thread/149862
Is there some feasible way to tweak the IL or ISA code into using this instruction? Has anyone done that?