cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

rvborgh
Adept I

SuperMicro H8QGi-F and Opteron 63xx processors

hi folks,

i've been running this quad socket H8QGi-F/SC748 case with 4 Opteron 61xx for the past 7 years or so as my home PC.

Recently i thought i'd do a quick upgrade to tide me over for another 2 years.  So i purchased some Opteron 6328 (relatively high frequency - 3.5 GHz/3.8 turbo) as well as a pair of Opteron 6380.

The strange thing with these 63xx series ("Piledriver" cores) is that they are only perfectly stable when running on this board with DDR3-1333 (at 9-9-9-24 timings) AND with Hypertransport downgraded to 1.0 ("HT1" setting in the BIOS).  When i first installed these i was puzzled by black screens and reboots when running the Rightmark Multi-Threaded Memory Test Benchmark.  The machine was seemingly stable when doing normal things (web surfing, etc), but as soon as you put any kind of severe memory load on it, it would black screen and then sometimes reboot.  The strange thing is that when i ran Rightmark, and confined all the threads onto a single die (8 cores within the MCM of the 6380) via assigning thread affinity, it ran fine.  As  soon as i started gradually assigning more and more threads to the second die in the MCM, then Rightmark would black screen.  The same when assigning to the 3rd and 4th dies on the other processor (i'm running two proc config for now).

After a while i realized it had to be related to the communications between the processors.  So i reduced the Hypertransport down to HT1 and then Rightmark was fine (almost perfectly stable) after that.  To gain complete stability i had to reduce the ram speed down to DDR3-1333.  At that point i was finally able to run these 6380s with full stability with Prime95 (both small FFT and Blend, as well as Rightmark).

The dram i am running is Crucial Ballistix rated for DDR3-1866 (PC14900) with Micron Technology ram.  BLT4G3D1869DT1TX0.

Why is it that i am having to reduce the Hypertransport down to 1.0 in the first place?  And what can be done about that?  The BIOS has settings for HT1 and Auto which sets things to HT 3.0.

And why can't these Opteron 63xx processors run this ram at DDR3-1600 at least?

Has anyone else run across this?

Thanks for any help.

PS: other that this, i've got no complaints.  

0 Likes
7 Replies

Why are you running 2 different types of processors even if they are the same series 63xx Processors?

Looking at CPU World for the 2 types of Processors I see two major differences:

Opteron 6380:

Screenshot 2022-10-31 153149.png

Opteron 6328:

Screenshot 2022-11-02 081708.png

Otherwise both are identical.

Both support:

Other peripherals HyperTransport 3.0 technology with 4 HT links
Bus speed  ? 

3200 MHz HyperTransport links (6.4 GT/s per link)

 

The only thing I can guess since one is 8 core and the other is 16 core there is some sort of conflict  and the speeds are not close which might be the reason for your issues.

Is like when you install a slower speed RAM with a higher speed RAM the Motherboard will use the slower speed RAM maybe something simliar is occurring with two different CPU Models of different cores and speeds.

Anyways the best place to post this thread is in AMD SERVERGURU that specializes in everything concerning Server hardware and software from here: https://community.amd.com/t5/server-gurus/ct-p/amd-server-gurus

hi, thanks for the reply. 

i ran two 6328s and ran into the issue, and then i swapped them out for a pair of 6380.  Both had the same issue.  Any time i put severe memory load on the system across all cores (ie RightMark, or AIDA64) after some time the system would black screen.

Hope that helps.  

Thanks for the update.

By your post since your Server Motherboard supports 4 CPUs I got the impression that you were using all 4 CPU sockets with different CPUs on it at the same time. Thus my reply.

Try posting at AMD SERVER GURU and see if anyone there either a Moderator or User can help you with your Server Motherboard and Opteron Processors from the link in my previous reply.

Don't know if Server Processors are as sensitive as Ryzen Processor to the type of RAM that is installed.

Is the RAM you are using listed  to be compatible with any of the two types of Opterons you have?

@santosh_zanjurne  is a staff member at that AMD Forum.

0 Likes

i thought i would update this thread.

After a ton of testing i found the culprit.

The problem seems to manifest when the memory controllers are subject to a lot of traffic.  Lowering the Hypertransport down to 1.0 helped the benchmarks to run longer before the processors reset themselves, but did fix the issue.

What fixed the issue was setting the Node Interleaving setting back to "Auto".

After that, completely stability.

I could then set my Hypertransport back up to Auto/HT 3.0.

The strange thing is that node interleaving does work for the most part, just does not work when you stress it.  i can tell because in memory benchmarking tests where i pin threads to a cores on a single die, the latency drops down to around 70, with jumps when you move the thread onto a different die in the same MCM, and then onto a different socket, as opposed to the Node Interleaving Auto/On setting where it is always around 120 or so.

i still don't understand why node interleaving disabled doesn't seem to work right on my quad socket machine with the 63xx processors, vs my old 61xx where it worked just fine with stability.

The good thing is that Node Interleaving enabled doesn't hurt benchmarks at all for the most part.

Good troubleshooting.

You should mark your last reply as 'Solution" so other will see how you fixed your problem.

0 Likes

If I recall correctly Supermicro strongly recommended registered RAM.

So are you running registered RAM?  Cause it sounds like your problem will rear its ugly head again.

I was about to ask the same thing and decided that I had better read what others replies had been submitted.  I can't help with the Opteron, but had a similar setup with 2 - Intel Pentium II 400MHz on a supermicro board.  I'm fairly certain that they had to be identical.