cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gstoner
Staff
Staff

Re: clinfo crashes with Vega Frontier Edition

We walking you through the step to get where we understand why you're getting the crash.  These forums are really for community support,  with a moderator.   I manage the team that looks at ROCm and OpenCL, which why I step in to help in 4 days ago to help.   I may need to go to the AMGPUpro Linux team since they repackage our software if packaging issue. We do know this is working on other systems.

Step one we had to unwind the AMDSDK from the stack which I ask the Prographic team to update their instruction to not install this

Now we have to look at harder issues

-  Also, you did not have processor details,  is this Core I5 v3 Haswell processor

- Is the GPU in the PCI_E2 slot according to the MSI user manual to get a full x16 electrical slot

- Next, did you install the Intel OpenCL SDK?

Need the output of

ls /etc/OpenCL/vendors

We need to see the output logs  of the following

sudo lspic -tvv

sudo lspci -xxxx

sudo lspci -vvv

Another option is to installed and test Opensource  ROCm driver,  which what we work on.  Use the following instruction.  Note Monday we are rolling out ROCm 1.6.1 driver it addressing some issue we found in the Power Mangement firmware.   This driver my team roles out so we can debug issue quickly.

ROCm Install

Quickstart OpenCL 

In your demesg log I saw this error I need to talk AMDGPU team on Monday about

One thing I am seeing is you need to talk to MSI about ACPI issue

[    1.057762] amdgpu 0000:03:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

  1.495876] amdgpu: [powerplay] Cannot find requested DCEFCLK!

[    1.764797] amdgpu: [powerplay] Cannot find requested DCEFCLK!

I look over more your strace

0 Likes
pdxtabs
Adept I

Re: clinfo crashes with Vega Frontier Edition

Now we have to look at harder issues

- Also, you did not have processor details, is this Core I5 v3 Haswell processor

- Is the GPU in the PCI_E2 slot according to the MSI user manual to get a full x16 electrical slot

- Next, did you install the Intel OpenCL SDK?

As stated in the first post, this is the processor: http://ark.intel.com/products/68316/Intel-Core-i5-3470-Processor-6M-Cache-up-to-3_60-GHz

I have the card plugged into the only 16 lane PCIe slot on the motherboard ("PCI_E1 PCIe x16 Expansion Slot" by my reading of the manual).

I did not install the Intel OpenCL SDK.

Need the output of

ls /etc/OpenCL/vendors

We need to see the output logs of the following

sudo lspic -tvv

sudo lspci -xxxx

sudo lspci -vvv

Please find attached. Of note, there is an error when executing sudo lspci -vvv "pcilib: sysfs_read_vpd: read failed: Input/output error" (I included it in the log).

Another option is to installed and test Opensource ROCm driver, which what we work on. Use the following instruction. Note Monday we are rolling out ROCm 1.6.1 driver it addressing some issue we found in the Power Mangement firmware. This driver my team roles out so we can debug issue quickly.

ROCm Install

Quickstart OpenCL

The open source ROCm has the following to say:

Supported CPU

ROCm Platform Leverage modern CPU with support with PCIe Gen 3 which aslo support PCIe Atomics (Fetch ADD,Compare and SWAP, Unconditional SWAP, AtomicsOpCompletion) To find out more about https://github.com/RadeonOpenCompute/RadeonOpenCompute.github.io/blob/master/ROCmPCIeFeatures.md’

When you install your GPU’s Make sure you install them on real PCIe Gen3 x16 or x8 lanes directly on CPU’s Root I/O controller or a PCIe Switch directly attached to the CPU’s Root I/O controller. We have seen many issue with Consumer motherboard which support Physical x16 Connectors, but the connector is electrically connected as PCIe Gen2 x4, if you see this it is typically hanging off the Southbridge PCIe I/O controller. If your motherboard is configured this way please do not use this connector for your GPU.

I have no idea if my motherboard meets these requirements. Is this an undocumented requirement for OpenCL on this card? None of the marketing material that I read before purchasing this card called this out as a requirement (https://pro.radeon.com/en-us/product/radeon-vega-frontier-edition/😞

Requirements:

  • Typical Board Power: 300W
  • PSU Recommendation: >850W
  • Required PCI Slots: 2

The owners manual says:

SYSTEM REQUIREMENTS

...

  • PCI Express-based PC with at least one x16 lane graphics slot available on the motherboard.
  • Min 750W System power supply with two 8-pin PCIe power connectors.

Thank you for your help. I have relocated the workstation in question to make it easier to supply any additional information that you may need on a Monday-Friday basis. If I had to buy a new motherboard and processor to use this card it wouldn't be the end of the world, but it would be disappointing as I picked it out as the most card that I could stuff into this workstation (only one PCI-e x16 slot) and I've already spent a lot of money to get it working (card, power supply, shipping, tax).

jedwards
Staff
Staff

Re: clinfo crashes with Vega Frontier Edition

First, it looks like all of the configuration files are correct (the vendor file, location of the libraries, etc.), but I found this in your dmesg output: [    1.699813] kfd kfd: skipped device 1002:6863, PCI rejects atomics

.

This indicates that the KFD driver tried to initialize the 6863 device, but failed because PCIe atomics are not supported in the current configuration that you have. This is most likely because of the PCIe slot the card is installed in, because without atomic support, the ROCm driver stack will not be able to recognize the card. However, I noticed this was the specification for the PCIe slots on your mother board:

.

Expansion Slot(s) : 1 x processor - LGA1155 Socket

4 x memory - DIMM 240-pin

1 x PCI Express 3.0 x16

1 x PCI Express 2.0 x1

1 x PCI

.

Your slot is 3.0 capable, but I can't tell if it enables the atomic extension. I will need to analyze your lspci output to make sure.

0 Likes