HOWTO Resynchronize Software RAID Mirror
From Gentoo Linux Wiki
|
This guide describes what to do in case of disk failure in a RAID setup.
[edit] Guide assumptions
The guide assumes the RAID setup is similar to the one descirbed in HOWTO Gentoo Install on Software RAID mirror and LVM2 on top of RAID:
| RAID Device | Physical partitions | Mount point |
| /dev/md0 | /dev/hda1 /dev/hdc1 | /boot |
| /dev/md1 | /dev/hda3 /dev/hdc3 | / |
| /dev/md2 | /dev/hda4 /dev/hdc4 | LVM2 |
/dev/hda2 and /dev/hdc2 are both swap (not mirrored)
It is also assumed that mdadm is installed.
[edit] How to identify disk failure
[edit] Automated notification
The preferred way is to setup automatic e-mail notification.
[edit] Manual checking
Check the status of our arrays with cat /proc/mdstat:
| Code: example output |
Personalities : [raid1]
md1 : active raid1 hdc3[0]
10008384 blocks [2/1] [U_]
md2 : active raid1 hdc4[0]
145669312 blocks [2/1] [U_]
md0 : active raid1 hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
|
The [U_]'s indicate that a disk is down (represented by the underscore) in each of the arrays. We can see that hdc is present, but hda is missing. If hda was actually faulty, we would need to power down, replace the bad drive with another of equal or larger capacity, boot up, partition the new disk, and continue (or hotswap if we're so lucky). I don't know, but perhaps the arrays would begin rebuilding automatically after the reboot. If not, we would need to add the new drive to our arrays manually.
[edit] Repairing the RAID setup
[edit] Recreate partitions on current or new disk
After verifying the current disk is functional, or replacing if, rebuild the partition table on it:
sfdisk <source device> -d | sfdisk <destination device>
where <source device> is a disk currently in the raid, and <destination device> is the verified or new disk.
[edit] Resynchronizing the Arrays
Add the disk to the raid:
| Code: Adding partitions to existing raid |
# mdadm --add /dev/md0 /dev/hda1 # mdadm --add /dev/md1 /dev/hda3 # mdadm --add /dev/md2 /dev/hda4 |
Check the rebuild status with watch -n 6 cat /proc/mdstat
| Code: Sample output |
Every 6.0s: cat /proc/mdstat Fri Oct 13 15:39:22 2006
Personalities : [raid1]
md1 : active raid1 hda3[1] hdc3[0]
10008384 blocks [2/2] [UU]
md2 : active raid1 hda4[2] hdc4[0]
145669312 blocks [2/1] [U_]
[>....................] recovery = 4.3% (6329664/145669312) finish=43.7m
in speed=53078K/sec
md0 : active raid1 hda1[1] hdc1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
|
