OpenCL is a open standard. It still does not support this swizzling concept. It does not even support wavefront/warp yet.
So, You cannot use this feature in OpenCL.
There are others who try to code in IL. They may be able to help you out here.
Thank you, Himanshu.
It's a pity that AMD didn't introduce any extensions for that. It's waste, sure.
And now, I don't want to learn AMD IL which is expected to be soon deprecated. I'm waiting for the new HSA IL. maybe I could use that for shuffling.
Now my question was completely answered by you.
That's a nice find!
Although I don't know any IL instr which explicitly uses DS_Swizzle.
I was checked it, maybe other instructions are there and found some new undocumented gems (introduced whatever after cat11.12):
96bit, 128bit (continuous) DS_ instructions with one offset.
v_floor/ceil/trunc for f64
s_cbranch_debug_system, s_cbranch_debug_user : Maybe this is windows's "int 3" one byte debug equivalent.
ds_wrap_rtn_b32 : another complex ds opetarion
v_mad_i64_i32 -> 64bit(32bit * 32bit) + 64.bit, now that's great for 64bit address arithmetic, I guess it takes only 4 cycles and is made of reusing some parts of the f64 unit. With mul_lo, mul_hi, add, addc it would take 10 cycles.
flat_* : Memory IO operations: I think it only needs a flat 64bit address, but IDK...