We are shipping our Windows applications with pre-compiled GPU binaries for different device architecture (Tahiti, Hawaii, Fiji, Tonga, Ellesmere, Baffin, etc.).
The compilation is done on a virtual machine without any GPUs by calling clCreateContextFromType with CL_CONTEXT_OFFLINE_DEVICES_AMD=1 and then running clCreateProgramWithSource, clBuildProgram. clGetProgramInfo(CL_PROGRAM_BINARIES).
As no AMD driver is installed on the machine, driver files are extracted to a folder contained in system PATH.
Source is compiled with these bif options: -fno-bin-source -fno-bin-llvmir -fno-bin-amdil -fbin-exe
This works fine for all devices, including those mentioned above and gfx900. However, when I try to generate a binary for Vega Frontier Edition (gfx901), I'm getting the following error after calling clBuildProgram:
AMD HSA Code Object loading failed.
When I change bif options to -fno-bin-source -fno-bin-llvmir -fbin-amdil -fno-bin-exe, I'm receiving this:
Extracting AMD HSA Code Object from binary failed.
I also tried to perform the compilation on a computer with a Fury card installed, but received the same errors.
What is the correct procedure to generate binary for Vega Frontier Edition (gfx901) on a computer running Windows 10 without this specific GPU?
The issue is still under discussion so can't share much at this point. Will update you as soon as I get any conclusive one.
One point to be noted though. Currently two different driver packages are available for Vega devices under Windows –
1) Mainline Crimson one which supports Vega (but doesn’t include Frontier edition in the supported device list) (Radeon Software Crimson ReLive Edition 17.8.2 Release Notes )
2) A special Vega Frontier edition (AMD Drivers )
Did you try the 2nd one?
OK, so two months later I'm facing exactly the same issues with Crimson 17.10.3 drivers dated 27-Oct-2017 for Win7/64. Compilation for fgx901 results in "Error: AMD HSA Code Object loading failed.", there are two versions of gfx900 & gfx804 devices when requesting list with CL_CONTEXT_OFFLINE_DEVICES_AMD=1, i.e. everything timchist reported back then. So, still there no solution? Should we just tell our users "Avoid buying AMD Vega GPUs at all"?..
AFAIK, the issue was already fixed (sometime in September) in the internal OCL stack and was expected to move into the mainline build. Depending on the target build version, sometimes fixes may take little longer than usual to get released publicly.
Actually, most of our discussion happened in private and I already informed about the fix. Sorry, I forgot to share the update on this thread itself.
Honestly, your reply is not really an update. Right now it's impossible to compile kernels for Vega FE GPUs (I guess it's only possible with physical Vega FE presence because of software limitations implemented to separate Pro series from "budget" series... but it's only a guess), it lasts for 2 months, there no ETA on fix and nothing we can do about it (there no program within AMD to provide access to 3rd party developers to internal OCL stack, right?)
Pretty depressing but no surprises.
As just confirmed, the fix has been already promoted to the next version mainline driver and the previous public release of 17.10.3 or any release based on 17.40 doesn’t have the fix yet. I understand your disappointment. Please keep patience.