cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

timchist
Elite

gfx900 binaries do not work on Vega 64 with HBCC segment enabled on 21.q3.1 pro driver

With 21.q2.1 and earlier drivers Vega GPUs have been reported as 'gfx900' (HBCC off) or 'gfx901' (HBCC on). In both cases the binary for gfx900 could be used and worked correctly. Binary for gfx901 could also be used, but worked somewhat differently (we obtain binaries by compiling OpenCL offline using the CL_CONTEXT_OFFLINE_DEVICES_AMD approach).

In 21.q3.1 pro the GPU is reported by OpenCL runtime as 'gfx900' regardless of HBCC settings. However with HBCC on, attempts to load binary for gfx900 result in an CL_BUILD_PROGRAM_FAILURE, with error log showin "Error: AMD HSA Code Object loading failed.\nError: Cannot set kernel \n".

gfx901 binary works when HBCC is on, but again, produces a slightly different output compared to gfx900.

When HBCC is off, only gfx900 binary can be used. Attempts to use a binary for gfx901 produces the same error as described above.

How can we check programmatically whether HBCC is enabled or no so that the correct binary is submitted to clCreateProgramWithBinary / clBuildProgram?

0 Likes
11 Replies
dipak
Big Boss

Hi @timchist 

Thank you for the above query. I have forwarded it to the OpenCL team. Once I get their `feedback on this, I will share with you.

Thanks.

0 Likes

Hi @timchist ,

As I've come to know, HBCC is related to XNACK. When HBCC is ON, it enables xnack, so gfx901 is just gfx900:xnack+ . With HBCC on, the runtime is expected to report  "xnack+" suffix. Is the clinfo reporting this suffix? Can you please provide the clinfo output?

Thanks.

 

0 Likes

Hi dipak,

I replied you a couple of weeks ago with the full clinfo output, but for some reason my response is missing here now.

There is no difference in clinfo output regardless of whether HBCC is on or off. There is no 'xnack+' or 'xnack-' suffix in the device name reported by clinfo: it is 'gfx900' in both cases with the recent driver.

Please advice how one can get xnack status programmatically via OpenCL.

Thanks

0 Likes

Thank you for the information. As per my understanding from the OpenCL team's feedback, the runtime is expected to report appropriate xnack suffix. However, as you said, the suffix is missing in the clinfo output. I will report it to the OpenCL team. Please attach the clinfo output.

By the way, looks like the latest driver is available here: Radeon PRO 21.q4 . Did you try this driver? If not, please check and share your findings.

Thanks. 

0 Likes

Hi dipak

just tested Radeon PRO 21.q4 and there is no difference to 21.q3. GPU name is reported as gfx900 with or without HBCC enabled. When HBCC is enabled, attempts to use binary for gfx900 result in the error I reported earlier.

0 Likes

Thanks for sharing the above findings. I have reported the issue to the OpenCL team.

 

0 Likes

It appears there is a bug in the driver runtime which is misreporting the XNACK setting. The concerned team is investigating the issue. Once I have any update on this, I will get back to you.

Thanks.

 

0 Likes

Thanks dipak. Please keep me in the loop.

0 Likes

Update:

The "xnack" suffix related issue has been fixed. The fix is expected to be released soon.

Thanks.

 

0 Likes

Thanks dipak. How soon will the fix be released? What driver version should I be waiting for?

0 Likes

Sorry, I can't give you an ETA at this moment. I'll let you know if I get any information on this.

Thanks.

0 Likes