Assume the local work size = 64, and I want to broadcast a uint32 from the thread with local id = 0 to all threads within the workgroup. How should I do it?
Those DS_BPERMUTE_B32 instructions on Vega etc. are certainly nice, but I don't see them mentioned in the Hawaii ISA doc. I've tried a simple test using work_group_broadcast() with OpenCL 2, and it generated about 150 lines of assembly with all sorts of ifs and branches, so I suspect there could be a better way.
What would you recommend that I do?
Thank you!
References -
http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf
https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/work_group_broadcast.html