the GPUs as well as the AMD-IL have an instruction for mul24_hi (unsigned as an example, but exists also for signed):
IL: UMUL24_high ( IL_OP_U_MUL24_HIGH )
However, in OpenCL I'm still missing this. If it is so difficult to get this into the OpenCL standard, could you not add an AMD-specific extension to add this important performance-feature?
I found this old thread about it, but it still seems to be unresolved
Is there some feasible way to tweak the IL or ISA code into using this instruction? Has anyone done that?
Retrieving data ...