The cross-lane instructions are not directly exposed to OpenCL. Instead, shuffle, shuffle2 built-in functions can be used for manipulating ordering of elements in a vector.
As I know, AMD’s HCC compiler provides intrinsic function support for ds_permute, ds_bpermute, and ds_swizzle. These functions can be called from HC or HIP kernels. Here is a nice article that describes in detail: https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
P.S. You've been whitelisted now.
Thank you for a quick answer and white list. To be honest I searched on Google, everything from top to bottom and found more or less everything that could be found. I hoped that someone has some insider information
I'm a Windows user and at this point HCC is not an option for me. I've been thinking about trying with Linux, but even that documentation is incomplete, links (on AMD site) are broken, everything looks amateurish. I'm a bit disappointed that such a big and respectable company like AMD has almost no documentation and software support for otherwise great GPUs (especially for Windows)
I'm a Windows user and at this point HCC is not an option for me. I've been thinking about trying with Linux
Yes, HCC is not an option if you're using Windows. Currently ROCm / HCC stack is only supported on Linux. ROCm ecosystem, however, has richer programming support for AMD GPUs.
Now coming to cross-line instruction support in OpenCL. These are architecture specific and available since GCN3. If you're using OpenCL for portability, it's recommended to use related OpenCL functionalities and rely on the compiler to convert them as appropriate. Otherwise, platform specific options such as mixing inline assembly code in kernel can be used if supported by the underlying compiler and/or platform.
AFAIK, ROCm OpenCL compiler supports inline assembly. I'm not sure about that on Windows. I'll check with the compiler team and let you know.
even that documentation is incomplete, links (on AMD site) are broken, everything looks amateurish.
As I've found, couple of links to GCN3_Instruction_Set_Architecture seem broken (looks like those links point to wrong version of document that doesn't exist). Here is the link to that ISA document: http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf .
It would be really helpful if you share your suggestion / feedback on the page itself.
As I've come to know, OpenCL subgroup functions (cl_khr_subgroups ) can be used for this purpose. The compiler generates appropriate cross-lane instructions against these subgroup functions whenever applicable.