The computer has been running Windows Server 2019 for about a year on a four drive RAID 10 array on an MSI PRO X670. We shut it down for the weekend and it wouldn't boot to windows when it was powered back on. It did boot to the Windows troubleshooter, which could not see the partitions associated with the RAID array.
The BIOS shows one drive removed from the array, but other three are shown with online status. So, while degraded, the array should work. Speaking of degraded, the status of the array is "critical".
I booted from the Windows DVD, went into installation and loaded the AMD RAID drivers from the same USB drive I originally set the computer up with, but still the array is not recognized. No, I did not get far enough for files to copy, obviously, since it couldn't see a hard drive.
Any thoughts on what to do? Some trick to get access to the data long enough to pull it off?
B
Solved! Go to Solution.
If a drive has been part of an array previously, (and is still readable) it has RAID configuration data on it. So it is "dedicated" to an array but for whatever reason, the controller doesn't think the fourth drive belongs to your RAID 10 array, or at least not the same array as the other 3 drives.
Typically a new, empty drive must be marked as spare and then the controller will automatically start the rebuild of the array. It may see a new drive and an array with a missing member and add it by default.
You could try removing the orphaned drive, put it in another system, completely wipe it and then return it to the array to try rebuild as above. However I hesitate to suggest attempting to reuse a drive that has "failed" for whatever reason, especially as you have data on the array you need to recover. The less modification/changes made and less time spent at recovery the better, which is why I suggested a new drive.
The goal of the fifth drive is to get the OS off the array. I don't entirely trust the Windows loader and RAID drivers, have seen issues recognizing those in the past. And you don't know what happened to the boot partition when the array "failed". So a fresh Windows install, on a completely separate drive, is a safe, non-intrusive way to try and access the original array partition(s) as read-only data. Which by the way is what I would recommend for future setups. Install the OS to a single, standalone SSD. They're fast enough without striping and there's nothing on it that can't be easily recreated. Save the mirrored disk space for your critical data.
You're correct that a RAID 10 should continue operating in a degraded (but functional) state with the loss of only a single drive. However since it clearly hasn't, I would replace the failed drive ASAP and verify the BIOS identifies it as a valid spare and rebuilds the array.
If this does not correct the boot issue with the OS located on the array, I would temporarily install a spare fifth drive and install the OS to this standalone drive. Once Windows is running on the spare, install the RAID drivers and see if the original partition is then accessible to copy the data off.
FunkZ, Thanks for the reply!
I assume the process would be to pull the bad drive, put the replacement drive in, and in the BIOS add it as a dedicated hot spare. What then? How do I tell it to rebuild?
The failed drive is visible to the system, just not part of the array, BTW. I did attempt to add it back since it checks good, but all it does is show as part of the array as "Dedicated", but that's it. Is there some magic rebuild option? I did find a rescan option somewhere, but just keeps identifying the three good drives with no change in [non]functionality.
Your second suggestion of loading windows on another drive and loading drivers: Since loading the drivers while running the Windows Installer did not make the drive visible, I'm pretty sure a windows install wouldn't be any different, but I guess I could give it a try (what can I lose?). Gonna be a busy day I guess.
If a drive has been part of an array previously, (and is still readable) it has RAID configuration data on it. So it is "dedicated" to an array but for whatever reason, the controller doesn't think the fourth drive belongs to your RAID 10 array, or at least not the same array as the other 3 drives.
Typically a new, empty drive must be marked as spare and then the controller will automatically start the rebuild of the array. It may see a new drive and an array with a missing member and add it by default.
You could try removing the orphaned drive, put it in another system, completely wipe it and then return it to the array to try rebuild as above. However I hesitate to suggest attempting to reuse a drive that has "failed" for whatever reason, especially as you have data on the array you need to recover. The less modification/changes made and less time spent at recovery the better, which is why I suggested a new drive.
The goal of the fifth drive is to get the OS off the array. I don't entirely trust the Windows loader and RAID drivers, have seen issues recognizing those in the past. And you don't know what happened to the boot partition when the array "failed". So a fresh Windows install, on a completely separate drive, is a safe, non-intrusive way to try and access the original array partition(s) as read-only data. Which by the way is what I would recommend for future setups. Install the OS to a single, standalone SSD. They're fast enough without striping and there's nothing on it that can't be easily recreated. Save the mirrored disk space for your critical data.
I removed the "failed" physical drive and replaced with an identical model spare that we had in case of just such an event (we didn't have it configured as a spare because the RAID controller only supported four drives).
Unfortunately the $&#^ thing doesn't seem to give me a way to rebuild the array. I have a series of pictures I took after removing the bad drive and installing the new one. You can view them here: https://amiausa.com/raidpics/index.html
I'm hoping maybe you (or anyone else following this thread) sees something I don't.
Brian
I'm sorry, some of those camera pictures of your screen aren't legible. If you put a USB drive in the system and press F12 it should save the screenshot to it.
I tried searching for a BIOS guide on AMD RAID specifically for the 600 series chipset and coming up empty. The ones I did find detailed how to create or delete from the BIOS, but gave instructions for rebuilding via the GUI.
Did you attempt the separate OS install drive to access the array?
OS install is today. Odd, I checked for legibility when I took the pictures. Mr. Murphy is not liking me.
I hadn't found anything about using Windows GUI for RAID recovery, would you consider sharing a link?
Good advice FunkZ. Strangely, and due to brain fog, not entirely sure why anything changed to get things back on track - Maybe a little Divine Intervention on top of good advice, blood, sweat, and prayers?
It took several iterations at reloading Windows 10 on an independent drive (along with RAID drivers) and eventually got it to see and rebuild and keep all the drives. Oddly, it had added the spare and kicked another drive off, so had to rebuild again. Chkdsk, under Windows 10, reported lots of errors.
However, once all drivers were on the array it would boot into Windows 2019! Chkdsk on 2019 was happy, so must be something to do with NTFS differences between Win10 and Win2019?
So all data, at this moment, seems to be intact. I haven't run database checks, but some documents opened, array passed consistency test, chkdsk is happy. So will install back at the office and see if applications see the data.
Now to fix issue that caused backups to be not current. We could have survived with those backups as they weren't really out of date, but would have been painful to play catch-up on top of having to catch-up on this past week.
Note for others: Be absolutely sure about driver install order. rcbottom, rcraid, rccfg. It can sometimes work in other order, but at one point I tried loading them in reverse order and while the drive showed up during windows install using loading driver when selecting install location, things will NOT work properly.
So, I installed the RaidExpress2 software on a clean install of Windows 10 on separate hard drive. When I run it it says driver not loaded (the raidcfg one). However, the software said it was going to install it and checking the device drivers it shows there are raid device drivers.
I installed them myself (by right clicking each of the three different drivers and clicking "Install"). No joy.
Rebooted and still no joy.
WTH could I be missing?
On the new Windows install, you should have in Device Manager under Storage Controllers the AMD-RAID and the array should appear under Disk drives.
If the controller is there but the drive is not, look in Disk Manager to see if the array shows up there.
Thanks for getting back. I reloaded Windows 10 again and for some reason the array showed up and started rebuilding. Apparently windows decided to install an update and reboot on me in the middle of it. When I brought it back up the replacement drive was online as part of he array and another drive was dropped off.
The drives appear to be corrupted now, as I can get the array to come online broken, but will not attempt to rebuild and chkdsk reports tons of errors on the partitions.
I used xcopy to copy as much off the drives as I could, but not sure how damaged the data/files are given the chkdsk errors.
I've taken a break to handle some other things and am about to take a closer look at the carnage and figure out where things stand.
Thanks for your assistance, as your advice to load Windows on an independent drive appears the best advice, but MS's desire to randomly modify software and reboot computers may have done me in.
I'm keeping my fingers crossed I find a way out of this mess.
B
Glad to offer assistance if it was useful in any way whatsoever, as I have been there in your situation trying to recover from a failed "redundant" RAID array.
Keeping my fingers crossed on your behalf you are able to recover the majority of your data as it seems there may have been a double drive failure. A 4-drive RAID10 is statistically less resilient than a straight up RAID1 but offers better performance than a RAID5.
One thing prior experience has taught me is that RAID is not a replacement for external backups. And I do mean external, as in make a periodic copy and store it in a physically separate location.
Absolutely. I usually setup USB external drive backups. At least three, but the more the better. Computer automatically starts backup late in evening when everyone is gone, finished by morning. The user swaps the drive out with one brought from home (usually small business owner), takes the current drive home at end of day. FIFO with any drives at home. With three there's worst case one at home, two at the office (most recent, and oldest backup during workday). Previous day's backup should be at home.
But the backup has to happen. In this case we had an issue that interfered with the backup recently.