> Can you post raw output (not using the script) on Git to see POF please?
Let me clarify what you want. My image is to get the log.txt of the build failure case
in the following command under WSL. Is it correct?
$ cd src/linux
$ make defconfig
$ make -j16 &>log.txt
And if correct, which number of this log do you want? Just one is OK?
Hi, my configuration is
Asrock B350 Pro4 With latest BIOS(AGESA 188.8.131.52)
Gskill Ripjaws V 32G (16G x 2) running at 2133
I try to build gentoo, and meet this bug too, With OpCache enabled, There will be sporadic segfault, (always happen)
After I disable OpCache, I've compiled whole gentoo for 3 times, There is no segfault anymore.
So, I believe this is caused by a bug in OpCache.
I'm just curious, wether this hardware bug can be fixed via BIOS update? I don't mean workaround.
And if the OpCache is wholly disabled, How much performance impact there will be please?
Correct - I'm trying to isolate this and run a comparison as I can't replicate it - it's driving me nuts.
and to clarify - I'm using gcc 7 - haven't tried anything older which most distros still use.
which GCC are you building it against?
I'm not saying there aren't any CPU issues - as every release (both Intel and AMD) have them, every time. It generally takes some time for compilers to build work-arounds as they catch up. The most subtle bugs in compilers can trigger errors not experienced with previous CPUs.
I'm still trying to replicate this and so far building kernel 4.11.4 on a loop overnight hasn't generated anything sadly. Next step is to downgrade my build system to use anything older than gcc-7; I might as well since I'm not keeping this installation.
here I never had problems with the kernel; I can let the kernel compile in a loop all day long; an easy way to trigger the problem is to start a mesa compilation with -j16 in a loop with a parallel gcc compilation (whatever version) with -j16; sooner or later mesa fails with a segfault in bash (sometimes gcc segfaults itself)
this happens with the whole system compiled with gcc 5, 6 or 7 with no cflags optimization or with optimizations
The kernel version I use is
Linux localhost 4.11.4-gentoo #2 SMP Thu Jun 8 20:59:54 -00 2017 x86_64 AMD Ryzen 7 1700X Eight-Core Processor AuthenticAMD GNU/Linux
The gcc versions I tried are both gcc 5.4.0 and gcc 6.3.0, I'm not building the kernel, I just do something like
while :; do if ! emerge media-libs/mesa; then break; fi; done
after that, I leave the test pc over night, we may meet the error when OpCache is enabled after a period.
This is the way I test if the system is ok.
After the mesa-test-loop runs long enough with OpCache disabled, I feel it may be stable,
Then I try 'emerge -e system && emerge -e world'. And the command finishes successfully 2 times as far as I tried.
With OpCache enabled, I never meet it pass both 'emerge -e system' and 'emerge -e world', Though, sometimes, the 'emerge -e system' may finish.
And I have MAKEOPTS='-j 16' in my /etc/portage/make.conf
If you need the make.conf, I'll no paste somewhere.
1 of 1 people found this helpful
I want to take a different approach.
I'm running Gentoo on a Gigabyte mATX mobo, with a Ryzen 5 1600 and Galax DDR4-3600 DRAM. I'm running 1.4 V core, stock cooler with a 750 watt EVGA PSU. I'm running at 3.8 GHz and RAM is running with BIOS default DDR timing at 2933 MHz. I'm running gcc 4.9.4 with no march option.
I emerged gcc-6.3 this week and word last week without issue at a -j12. I haven't had any problems.
Can someone (with a gigabyte mobo, preferably) give me a BIOS / CPU / DRAM / Voltage configuration that is known to have problems and an emerge that will cause the failure. I want to see if I can reproduce it with my rig.
I used Mesa v11.2.0 (- default source on Xenial/Ubuntu) since 17.1.2 required way too many dependencies I didn't feel like hunting down.
I've ran 14+ compiles (make clean ; make -j16) using gcc-7 and -j 16 without any issues - except for me getting pissed at sloppy errors shown in Mesa itself. I stopped counting but somewhere around the 18th time I did end up with a segfault. This is after I also started running "stress -c 16"! Without running "stress", nothing happens on this system.
I'm not worried about the temps as the fans barely spun up which only happens around the 50c range. I'll have to find some other way to test this as compiling Mesa with all its errors isn't quite reliable as a test.
So you are probably among the lucky ones without the bug. There are many users with the bug and many users without the bug. The weight of those "manies" I don't know.
And that is the real problem: not everybody has the bug. How is it possible? Are only some cpus affected?
1 of 1 people found this helpful
What would interest me more is if there are people with the motherboards identified thus far who aren't running into issues, because I keep thinking this is more a mobo/bios issue (perhaps related to SoC voltage? Voltage regulation ability of the mosfets? etc.) than a 'CPU' issue as such.
1 of 1 people found this helpful
I have a Gigabyte AB350M-D3H-CF. I'm using the BIOS F1 2/20/2017, I
believe it's a 184.108.40.206a BIOS.
Does anyone with this motherboard have the issue?
I don't know if the issue is just isolated to a few or if it's indeed a general issue with the CPU. Like I said, I got mine to crash, but that's only by running the "stress" program at full blast as well. I would like to know as well, but there's so little centralized information available - and what I would like to see is a reporting form on AMD's site with the variables (cpu, mobo, OS, crashes - broken down per error, etc etc) so we can watch for patterns. Right now we're just trying to piece it all together from spread out sources without a baseline/test or avenue for reporting.
All my other Ryzen systems are pretty much the same except I also have some 1700X floating around; motherboards are all Asus Crosshair VI Hero, all G.Skill RAM and EVGA PSU's.
To AMD: Please provide a standardized form just for the new CPUs to help narrow these problems down; limit text input and provide as many options as possible that pertain to Ryzen specifically.
Unfortunately, and surprisingly, this problem disappeared on WSL
when make -j16 &>log.txt (usually I redirect it to /dev/null).
Oh the other hand, on Ubuntu, it happens as usual even if make -j16 &>log.txt.
This problem is really sensitive about any changes, mb, hardware/software
settings, and so on.
I don't think it is sensitive to changes, it is very random. I mean, I can replicate it easily with the usual mesa/gcc parallel compilation, but sometimes everything just works fine for hours then suddenly things start to go bad. And I can't find any patterns to justify why two seconds before things worked and now they are not. No reboot, no changes, no ambient temperature increase, nothing.