cancel
Showing results for 
Search instead for 
Did you mean: 

Server Gurus Discussions

sho1sho1sho1
Adept I

Dual Socket EPYC 7702 64 cores shows 254 CPU online 1 CPU offline... Bad CPU?

Hi all,

Trying to install CentOS 7 with Dual socket AMD EPYC 7702 64 cores server and it didn't work.  Then I tried CentOS 8 and installed, but lscpu shows core 255 offline?  Do I have a bad core that failed to multi-thread?

Or is this an OS bug in which I need the ELRepo kernel-ml?

Architecture:         x86_64

CPU op-mode(s):       32-bit, 64-bit

Byte Order:           Little Endian

CPU(s):               256

On-line CPU(s) list:  0-254

Off-line CPU(s) list: 255                  ß-----------------------

Thread(s) per core:   1

Core(s) per socket:   64

Socket(s):            2

NUMA node(s):         2

Vendor ID:            AuthenticAMD

CPU family:           23

Thanks for your help in advance!

4 Replies
sho1sho1sho1
Adept I

Turns out to be a Gigabyte BIOS issue.  Engineers have issue with 7742 as well.  Hope to get a BIOS fix soon.

sho1sho1sho1
Adept I

Finally solved the issue!  CentOS 8 kernel has support for x2APIC, however, kernel fails to initiate interrupt remapping and disables x2APIC and falls back to APIC mode which only supports up to 255 threads.  x2APIC needs to work with IOMMU enabled.  The Gigabtye BIOS default is IOMMU disable.  After enabling IOMMU, all 256 threads are good and x2APIC enabled successfully.

Hope this helps anyone who is stuck on the issue.

The first thing I always suggest is using the latest BIOS for a workstation.

Linux is fairly good with server class rigs however some distributions are faster at updates than others. Red Hat (CentOS) is very well maintained.

Not sure why the BIOS does not enable the IOMMU by default as it is necessary for desktop and server operating systems alike. This is an issue for desktop and server alike. It seems to be a problem with all makes of motherboard I have seen.

Nice to see you are able to solve the issues. 

0 Likes
bardlam
Journeyman III

I registered an account just to thank you for your post and solution, @sho1sho1sho1. Thanks!

I just bought a Gigabyte R282-Z96 and I thought something might be wrong with one of the EPYC 7773X processors I installed in it. dmesg was reporting "smpboot: native_cpu_up: bad cpu 255".

Turns out, IOMMU is disabled by default. And for some reason, SMT is disabled by default as well (instead of "Auto") -- and probably a number of other things I haven't noticed yet. I'm using BIOS version M16_R31, which was just released in August 2023.

These processors were installed in a Gigabyte R282-Z90 immediately before this new server arrived, and I didn't have any of these issues with default BIOS settings causing failures.

0 Likes