With 21.q2.1 and earlier drivers Vega GPUs have been reported as 'gfx900' (HBCC off) or 'gfx901' (HBCC on). In both cases the binary for gfx900 could be used and worked correctly. Binary for gfx901 could also be used, but worked somewhat differently (we obtain binaries by compiling OpenCL offline using the CL_CONTEXT_OFFLINE_DEVICES_AMD approach).
In 21.q3.1 pro the GPU is reported by OpenCL runtime as 'gfx900' regardless of HBCC settings. However with HBCC on, attempts to load binary for gfx900 result in an CL_BUILD_PROGRAM_FAILURE, with error log showin "Error: AMD HSA Code Object loading failed.\nError: Cannot set kernel \n".
gfx901 binary works when HBCC is on, but again, produces a slightly different output compared to gfx900.
When HBCC is off, only gfx900 binary can be used. Attempts to use a binary for gfx901 produces the same error as described above.
How can we check programmatically whether HBCC is enabled or no so that the correct binary is submitted to clCreateProgramWithBinary / clBuildProgram?
Thank you for the above query. I have forwarded it to the OpenCL team. Once I get their `feedback on this, I will share with you.
Hi @timchist ,
As I've come to know, HBCC is related to XNACK. When HBCC is ON, it enables xnack, so gfx901 is just gfx900:xnack+ . With HBCC on, the runtime is expected to report "xnack+" suffix. Is the clinfo reporting this suffix? Can you please provide the clinfo output?
I replied you a couple of weeks ago with the full clinfo output, but for some reason my response is missing here now.
There is no difference in clinfo output regardless of whether HBCC is on or off. There is no 'xnack+' or 'xnack-' suffix in the device name reported by clinfo: it is 'gfx900' in both cases with the recent driver.
Please advice how one can get xnack status programmatically via OpenCL.
Thank you for the information. As per my understanding from the OpenCL team's feedback, the runtime is expected to report appropriate xnack suffix. However, as you said, the suffix is missing in the clinfo output. I will report it to the OpenCL team. Please attach the clinfo output.
By the way, looks like the latest driver is available here: Radeon PRO 21.q4 . Did you try this driver? If not, please check and share your findings.
just tested Radeon PRO 21.q4 and there is no difference to 21.q3. GPU name is reported as gfx900 with or without HBCC enabled. When HBCC is enabled, attempts to use binary for gfx900 result in the error I reported earlier.
Thanks for sharing the above findings. I have reported the issue to the OpenCL team.
It appears there is a bug in the driver runtime which is misreporting the XNACK setting. The concerned team is investigating the issue. Once I have any update on this, I will get back to you.
Thanks dipak. Please keep me in the loop.
The "xnack" suffix related issue has been fixed. The fix is expected to be released soon.
Thanks dipak. How soon will the fix be released? What driver version should I be waiting for?
Sorry, I can't give you an ETA at this moment. I'll let you know if I get any information on this.