Well the title already describes it.
I have got a code using 64k LDS on a Radeon VII and a RX 5700. Work group size is 1024.
Its working fine on Ubuntu 16.04 and 18.04 using amdgpu-pro 18.50, 19.30 and ROCm 2.10 (all on VII) and in an other system on 19.30 on the RX 5700.
Unfortunately it does not work on the first test system (VII) booting Windows 10 using Adrenalin 19.10.1 WHQL. The code compiles well but once queued does exit with a CL_OUT_OF_RESOURCES error. I doubt it is the compiler since I had the feeling Linux 19.30 and Adrenalin 19.10.1 are more or less binary kernel compatible.
A variant with only 32k shared memory and the remaining part of the shared operations shifted to global memory does work on all systems, but is super slow. Unfortunately some of my clients run Windows, so I wonder how to get this to work with the Adrenalin runtime - especially since the ISA for Vega states the full 64k are available.
Additionally I wonder if there are any documentations about the existing runtime environmental variables the AMD drivers understand. Concretely I am searching for options to switch WAVE32 / WAVE64 and WGP / CU mode on Navi
Thanks in advance
No one? Its obviously no limitation of the hardware nor of the compiler - it rather seems to be a simple check in the queueing system that claims resources get exceeded. There should be a simple way to disable that / increase the bound for mentioned kind of cards. If not would consider that a bug of the runtime.
Thank you for the above query. I have forwarded your query to the OpenCL team for their feedback. As soon as I get their reply, I'll come back to you.
As the OpenCL team has replied, currently Vega has 64 KB local/shared memory enabled on Linux, but 32 KB on Windows. This could be the reason for the CL_OUT_OF_RESOURCES error.
Navi has 64KB local/shared memory enabled on both Windows and Linux, so the code is expected to work fine on Navi.
Thanks for the reply. So I can ship the faster versions for Navi, but not for Vega in Windows. Hmm. sad to hear that. Are there any plans to make the full size on Vega available with upcoming runtime releases?
Also I wonder what about 128k shared memory on Navi in WGP mode. Any chance to activate that yet?
Are there any plans to make the full size on Vega available with upcoming runtime releases?
Sorry, I can not provide a time frame at this moment.
Regarding your other query about the shared memory on Navi in WGP mode, I'll check with the OpenCL team and confirm.
Regarding LDS usage on Navi, here are some important insights shared by the OpenCL team:
Control the default wavefront execution mode used when generating code for kernels. When disabled native WGP wavefront execution mode is used, when enabled CU wavefront execution mode is used.