cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

catsunny
Journeyman III

How to do wave shuffle?

Hi boys and girls,

I read the GCN ISA manual and came to found an DS_SWIZZLE instruction. which is capable for doing inter-thread data exchange without touching LDS memory.

But the instruction is not exported into amd app sdk's opencl language. So, How to use it?

It's a great feature, which is exactly the AMD version of the "warp shuffle" feature of NV's kepler cards.

So it's better to use it.

Thank you.

0 Likes
3 Replies
himanshu_gautam
Grandmaster

OpenCL is a open standard. It still does not support this swizzling concept. It does not even support wavefront/warp yet.

So, You cannot use this feature in OpenCL.

There are others who try to code in IL. They may be able to help you out here.

0 Likes

Thank you, Himanshu.

It's a pity that AMD didn't introduce any extensions for that. It's waste, sure.

And now, I don't want to learn AMD IL which is expected to be soon deprecated. I'm waiting for the new HSA IL. maybe I could use that for shuffling.

Now my question was completely answered by you.

0 Likes
realhet
Miniboss

That's a nice find!

Although I don't know any IL instr which explicitly uses DS_Swizzle.

I was checked it, maybe other instructions are there and found some new undocumented gems (introduced whatever after cat11.12):

96bit, 128bit (continuous) DS_ instructions with one offset.

v_exp_legacy/log_legacy

v_floor/ceil/trunc for f64

s_cbranch_debug_system, s_cbranch_debug_user : Maybe this is windows's "int 3" one byte debug equivalent.

ds_wrap_rtn_b32 : another complex ds opetarion

v_mad_i64_i32  ->  64bit(32bit * 32bit) + 64.bit, now that's great for 64bit address arithmetic, I guess it takes only 4 cycles and is made of reusing some parts of the f64 unit. With mul_lo, mul_hi, add, addc it would take 10 cycles.

flat_* : Memory IO operations: I think it only needs a flat 64bit address, but IDK...

0 Likes