Hi boys and girls,
I read the GCN ISA manual and came to found an DS_SWIZZLE instruction. which is capable for doing inter-thread data exchange without touching LDS memory.
But the instruction is not exported into amd app sdk's opencl language. So, How to use it?
It's a great feature, which is exactly the AMD version of the "warp shuffle" feature of NV's kepler cards.
So it's better to use it.
OpenCL is a open standard. It still does not support this swizzling concept. It does not even support wavefront/warp yet.
So, You cannot use this feature in OpenCL.
There are others who try to code in IL. They may be able to help you out here.
Thank you, Himanshu.
It's a pity that AMD didn't introduce any extensions for that. It's waste, sure.
And now, I don't want to learn AMD IL which is expected to be soon deprecated. I'm waiting for the new HSA IL. maybe I could use that for shuffling.
Now my question was completely answered by you.
That's a nice find!
Although I don't know any IL instr which explicitly uses DS_Swizzle.
I was checked it, maybe other instructions are there and found some new undocumented gems (introduced whatever after cat11.12):
96bit, 128bit (continuous) DS_ instructions with one offset.
v_floor/ceil/trunc for f64
s_cbranch_debug_system, s_cbranch_debug_user : Maybe this is windows's "int 3" one byte debug equivalent.
ds_wrap_rtn_b32 : another complex ds opetarion
v_mad_i64_i32 -> 64bit(32bit * 32bit) + 64.bit, now that's great for 64bit address arithmetic, I guess it takes only 4 cycles and is made of reusing some parts of the f64 unit. With mul_lo, mul_hi, add, addc it would take 10 cycles.
flat_* : Memory IO operations: I think it only needs a flat 64bit address, but IDK...