Showing results for 
Search instead for 
Did you mean: 


Adept II

Feature request: expose newer AMD GCN / RDNA features as CL extension

Back in the early days of OpenCL AMD added the famous cl_amd_media_ops (2) to expose hardware features to the programmers. Sadly with some of there more recent or more hidden hardware features like GDS or the cross lane operations this is not the case - in fact using amdgpu-pro drivers or Windows Adrenaline it is almost impossible to use this features without external disassembler / assembler, which make it very painful to use, especially in quickly changing products or long programs.

Thus I wanted to propose two new extensions to be implemented, one that could be names  cl_amd_gds and one cl_amd_cross_lane_ops.

For the GDS I am aware that virtualization is an issue especially since it remains valid cross kernels, so I would suppose creating an own space qualifier __gds (similar to __global) that also needs to be initialized like global memory - so with special host functions doing the virtualization in software and its only available as kernel argument, but can not initialized within the kernel. Also access and barriers would be similar to access to __global.

For the cross lane operations it would be nice at least to have

gentypen amd_ds_bpermute(gentypen sourceRegister, uint lane) where lane is modulo laneSize (32 or 64), which can be received via CL_DEVICE_WAVEFRONT_WIDTH_AMD

gentypen amd_ds_permute(gentypen sourceRegister, uint lane)

and maybe some broadcast operation based on swizzle.

I think I would not be the only one loving to have easy access to this great hardware features.

1 Reply

Thank you for the feature request and sharing your suggestions. I'll pass this on to the appropriate team.

Regarding the cross-lane instructions, I would like to mention one point here. Though the cross-lane instructions are not directly exposed to OpenCL, the OpenCL compiler can produce the cross-lane instructions against the subgroup functions. So it depends on what you want to do with those instructions. For example, if you want to do a scan, reduction, broadcast, or ballot, you already have the subgroup functions for that purpose.