Replace Failed Drive In Software Raid Array In Linux
Identify the broken drive
Start by identifying the device as the system know it (ie. /dev/sdX or /dev/hdX). The following commands should provide you with the information:
cat /proc/mdstat
You will see something like this:
[root@server1 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0]
511988 blocks super 1.0 [2/1] [U_]
md1 : active raid1 sda2[0]
976247676 blocks super 1.1 [2/1] [U_]
bitmap: 7/8 pages [28KB], 65536KB chunk
unused devices:<none>
Things to note above:
1. There are two raid arrays – md0 and md1
2. They are both raid 1 and they both have a failed drive (denoted by the ‘_’ next to the U)
Lets get some more information about md0
mdadm --detail /dev/mdX
You will see something like this:
/dev/md0:
Version : 1.0
Creation Time : Thu Nov 15 17:16:06 2012
Raid Level : raid1
Array Size : 511988 (500.07 MiB 524.28 MB)
Used Dev Size : 511988 (500.07 MiB 524.28 MB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Mon Nov 11 11:45:06 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : localhost.localdomain:0
UUID : 7888cd10:f2be2962:eb14fa6e:761d94fa
Events : 172
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed
So, /dev/sda1 is a good partition. So as I have two raid arrays I will repeat for /dev/md1 and I will find /dev/sda2 is good. So my second hard drive is flaked out on me.
Once you’ve identified the drive, you want to know something more about this drive, as /dev/sdX doesn’t really tell us how the drive looks like. In my case, I have two identical drives, so the following command should help help you identify the faulty drive. Replace X in the next command with “b” or “c” (or some other letter other than “a” as we know from above “a” is good.
hdparm -i /dev/sdX
That should give you both the model, brand and in some cases even the serial number. Hence this should be plenty to identify the drive physically.
Replace the drive
Not much to be said here. I assume you already know this, but you need a drive of equal size or larger.
Partition the new drive
If your system will boot up in degraded mode, then just boot up your system. If not, boot it off of a Live CD (I used Cento’s LiveCD in ‘Rescue mode’).
Once you’ve made it to a console, the first thing we need to do is to partition the new hard drive. The easiest way to do this is to use sfdisk and use one of the existing disks as the template.
sfdisk -d /dev/sdY | sfdisk /dev/sdX
(where sdY is a working drive in the array, and sdX is your new drive)
Rebuilding the array
The final step is to add the new drive to the array. Doing this is surprisingly easy. Just type the following command:
mdadm /dev/mdZ -a /dev/sdX1
(assuming you want to add the partition sdX1 to the RAID array mdZ)
Of that went fine, the system will now automatically rebuild the array. You can monitor the status by running the following command:
cat /proc/mdstat