I've been writing a number of compute shaders in DirectX/HLSL assembly (assembled into DirectX bytecode). Many of these shaders perform 32bit rotates. While studying their corresponding .isa files, I've noticed that I can generate 32bit rotates that use two 32bit shifts and an or/xor(generated from 1 ISHL, 1 USHR, and one OR/XOR), or one 64bit shift (generated from one USHR and one BFI). According to RGA/Instruction.cpp at master · GPUOpen-Tools/RGA · GitHub , it seems to be the case that v_alignbit_b32 would be superior to using two shifts and an xor (4 cycles vs 12)... I'm not totally sure how v_lshlrev_b64 compares as it seems to be inexplicably missing from there, but at least with the tests I've been running, it doesn't seem to be as much of an improvement as I'd hope.
With that in mind, is there any way to structure my DirectX/HLSL assembly so that it uses v_alignbit_b32 for 32bit rotates? If not, is that likely to change in future updates to the driver?
I'm running a Radeon RX 580 with up to date drivers.