cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

runningman

If you wanted to get a good CPU you would follow the RMA procedure we also followed. You preferred to get money back instead. Even it is not the most ethical thing, it seems AMD will not do a recall of problematic stock. We also don't know what is the failure percentage, only AMD knows that. So if you want to get a good CPU, go get a new one with your refund and if it also exhibits the problem, start an RMA process.

Assuming failure rate is 50% and out of 1000 processors sold, only 10-20 ask for RMA, I would do what AMD does. It is logical. Check the "recall algorithm" of failed cars in "Fight Club". It's all about money after all. I would also like for AMD to stand up and do a complete recall of flawed stock, but I don't see this happening. And since I got my good CPU, I really don't care any more.

Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

nop

I haven't yet tried manual timings, mine is at what DOCP sets, 16-18-18-18-36. I might also try again 3066 since 0810 BIOS is reported to work OK with our RAM at this speed.

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

runningman wrote:

I wish to know how to buy a properly functioning CPU and know if there are any other issues we're not aware of. AMD's silence makes it look like there are more issues than we know about right now.

Before anyone gets angry about this post, I'm no Intel fan either and I just want to get a functional CPU.

Nothing to be angry about at all. I don't disagree with _anything_ you've said and honestly I'm disappointed to have been through this process (and that I had to go through the process).

On the up-side, it does show AMD cares and is willing to help. It would be even better if they came out and explained it but I'm positive there's an extremely strong business case for them *not* doing that. ie, shelves full of faulty processors that 99% of the Windows using population will never notice.

5:37 into testing and 18 error free cycles. Only a couple of days to go

Highlighted

Re: gcc segmentation faults on Ryzen / Linux

malakudi wrote:

runningman

If you wanted to get a good CPU you would follow the RMA procedure we also followed. You preferred to get money back instead. Even it is not the most ethical thing, it seems AMD will not do a recall of problematic stock. We also don't know what is the failure percentage, only AMD knows that. So if you want to get a good CPU, go get a new one with your refund and if it also exhibits the problem, start an RMA process.

Assuming failure rate is 50% and out of 1000 processors sold, only 10-20 ask for RMA, I would do what AMD does. It is logical. Check the "recall algorithm" of failed cars in "Fight Club". It's all about money after all. I would also like for AMD to stand up and do a complete recall of flawed stock, but I don't see this happening. And since I got my good CPU, I really don't care any more.

It was a warranty process as well. I didn't send it to them to get a refund. I've sent it to the shop by initiating the warranty process and they've decided they'd issue a refund. I can't tell you anything more as I wasn't told more either.

0 Kudos
Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

runningman

No shop would take the route of direct RMA to AMD and no shop would spend the time needed to get this resolved. I guess they offered you a CPU from their stock or from their supplier, with no warranty that it would fix your issue, or a refund. Unfortunately the only way is to buy a new CPU and if it is faulty again, proceed with RMA with AMD.

Highlighted

Re: gcc segmentation faults on Ryzen / Linux

malakudi wrote:

runningman

No shop would take the route of direct RMA to AMD and no shop would spend the time needed to get this resolved. I guess they offered you a CPU from their stock or from their supplier, with no warranty that it would fix your issue, or a refund. Unfortunately the only way is to buy a new CPU and if it is faulty again, proceed with RMA with AMD.

You have the right to have your own opinion, but that doesn't meant it reflects reality or that it's the same for every shop. Have a good day.

0 Kudos
Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

bradc wrote:

You've obviously never had to provide (or deal with advanced) technical support before. I've lost count of the number of times I've had to deal with "experts" who jump to conclusions on what the fault is and are (probably in the majority) wrong. I'm not saying you are wrong, I'm just saying there is a process in place for a reason.

You *have* to follow the process. I followed the process. I sent pictures, took measurements, waited while they shipped me stuff to try, tried the stuff they shipped me and then finally got an RMA approval. It took weeks. You have to follow the process. There is no way to shortcut the process because frankly to the tech support guys you may be just another nut-job who read about an issue on the internet and asked for a new processor.

Follow the process, do what they ask you to do and you'll get sorted. Yes, it's annoying. Yes it's time consuming. Yes you (and I) should have bought an Intel processor. But rather than return the Processor & Motherboard, I chose to follow the process and get it sorted. You apparently have also.

They are swamped at the moment. Even with an in-progress ticket it can take 3 days to get a response. Multiply that by the back and forward required to work the process and it's not fast.

I opened a ticket on the 25th of July. I got a new CPU today. I count 8 rounds of communication in that time. So 41 days. Averaging a back and forth every 5.1 days.

*Not* fast. I did get a result though. Small mercies.

Or, AMD could *actually be proactive* and publish a list of everything that they want from someone in order to get an RMA for this issue. Instead of spending weeks going back-and-forth before they are willing to issue an RMA, people could just submit all of the required data at once and get an expedited approval for what is a known issue impacting months worth of production.  I don't blame them for wanting the information to avoid getting scammed by say, overclockers wanting a better binned chip - but at least publish a list and let people satisfy it all in one go!  However, doing that would require that they publish some kind of statement or FAQ or direction for users, and would violate the policy they have taken on this issue of "keep quiet so no one figures out how bad this really is".

This is getting really ridiculous.  AMD was successful in cutting the legs out from under all of the users as far as discussing this by getting Phoronix to say it's "fixed" and it "only impacts linux".  It has pretty much ensured that this thread is the only place on the internet that there can be a genuine discussion on this issue.  Anywhere else and you just get a flood of Internet know-it-alls posting links to the Phoronix article and all the other articles that refer back to it and bashing people over the head with 'this is a non-issue'.

If I had an extra $1600 or so sitting around, I'd go out and buy 2 identical setups with early date code Ryzen systems, RMA 1 of them, setup identical hard drives with a system to re-image them regularly, and start looking for programs in Windows that trigger this issue.  GCC on Cygwin is already a known trigger.  I am SURE there are other Windows workloads that will cause either data corruption or crashes.  It's just that most Windows users are used to a certain amount of instability from their systems and will blame crashes, etc on just about everything *except* their CPU - they will blame graphics drivers, the program they are running, Windows itself, etc.  Especially given the level of denial going around on this issue from many AMD fans who are currently buying Ryzen.

Once someone can prove this impacts Windows workloads beyond GCC as well, things will really hit the fan.

Highlighted
Adept I
Adept I

Re: gcc segmentation faults on Ryzen / Linux

bradc wrote:

You've obviously never had to provide (or deal with advanced) technical support before. I've lost count of the number of times I've had to deal with "experts" who jump to conclusions on what the fault is and are (probably in the majority) wrong. I'm not saying you are wrong, I'm just saying there is a process in place for a reason.

You *have* to follow the process. I followed the process. I sent pictures, took measurements, waited while they shipped me stuff to try, tried the stuff they shipped me and then finally got an RMA approval. It took weeks. You have to follow the process. There is no way to shortcut the process because frankly to the tech support guys you may be just another nut-job who read about an issue on the internet and asked for a new processor.

I get it. However, a company can take different routes to deal with issues like this. The AMD process is designed for customers whose time is cheap. They are making you spend hours. In my case they want me to keep running tests over and over again, increasing voltage by increments of 0.05v from the stock all the way to 1.4v. With a time-to-fail about 10 minutes on average, plus 10 minutes to re-set everything between the runs, that's like a full weekend of work. They think it's reasonable to ask someone to spend a weekend troubleshooting their $320 CPU to get a replacement part?

WTF. This does not make any economic sense unless you're a teenager. You will have a much easier time replacing a $320 pair of shoes with Zappos.

Highlighted
Adept III
Adept III

Re: gcc segmentation faults on Ryzen / Linux

I don't know what people are saying when they are writing in to AMD Support, but I didn't have any follow up questions for my ticket. All I did was do my testing in advance, made sure to cover all the troubleshooting points AMD would likely ask me, and made it clear that the issue I was having was due to CPU "marginality" which cannot be serviced by the user.

Got my reply back of them roughly saying "Sorry about your problems... We don't believe further troubleshooting can resolve the issue... Next step is to RMA... Yes, we agree your troubleshooting is thorough so we don't want to waste time and money on future RMAs, so please confirm your system setup so we can test the chip on our end to make sure it is defect free...".

Got replacement in like 3~4 days and the chip was segfault free. The only difference I can think of that would have effected my turn around time would be "when" I submitted my ticket. I knew submitting it early wouldn't expedite it because other users were already doing multiple RMAs to help AMD characterize the issue. But once it was confirmed they were able to reproduce the problem and have a reliable way to test for it, I jumped on submitting a ticket early to avoid the mad rush. As a result, turn around time for communication was only like 2~3 days per each reply. I can imagine now with a large stampede of people writing in (some with no reason other than the fact that they heard this issue exists), that times are significantly longer.

Highlighted
Adept II
Adept II

Re: gcc segmentation faults on Ryzen / Linux

Today I got my RMA CPU (Ryzen 7 1700X). It's labeled 1728SUS (the broken one is 1707SUT).

The package I got seemed to be sealed - maybe it was opened on the bottom - I can't say this for sure. There where no additional markers.

Currently I'm running my compile tests (rpmbuild --rebuild of the leap kernel). Each compile takes around 10:40 minutes (including building of rpm packages). It's about the same performance as I saw with the old one.

Meanwhile it has been running fine about 12 times w/o any problem. Before, I usually saw the first segfault after about 1 or 2 minutes. Mmore than one complete kernel never worked before. Tests have been done so far running Linux 4.9.46 and 4.12.10 / gcc 4.8.5 w/ enabled opCache and (k)aslr.

As others already told, the CPU voltage (VDDCR in bios) of the new CPU is significantly lower (1.275 V or less) compared to 1.438 V. During high load (compiling) the voltage seldom is higher than 1.24 V (it's been a spike) - most of the time it is around 1.08 V.

Board: Asus Prime X370 Pro / Bios: 0810 08/01/2017 / Using default settings besides enabled IOMMU and SVM.