replace a disk in a raid1 array on linux

Posted on August 6, 2020 in linux
Last edited on August 6, 2020

From time to time, even the best drives go bad and need to be replaced. Here’s my notes for doing this without screwing things up.

Raid1 means we have two drives, for the sake of this note (and because it’s mostly true for my servers), these are /dev/sda and /dev/sdb.

Please: If you follow this note, make sure to double- or tripple-check your device names and consider additional research before taking action on important infrastructure! This is not a tutorial, nor a “definite guide to…” ☝️

In all cases: review configuration

The easiest way for me is always the listing in /proc/mdstat. It shows you, which disk is assigned to which md device and also, which one has gone bad.

cat /proc/mdstat

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdb2[1] sda2[0]
      999021888 blocks super 1.2 [2/2] [U_]
      bitmap: 7/8 pages [28KB], 65536KB chunk

md0 : active raid1 sdb1[1] sda1[0]
      1046528 blocks super 1.2 [2/2] [U_]
      
unused devices: <none>

In this case, if we follow the lines that start with md1 and md0, which are the raid devices in this machine, we can see that sdb2[1] and sda2[0] are part of the raid device. The notation [U_] at the end of the next line shows us the missing/down raid member. These have the save order as they are listed in line above, so sda2 is the defective member in md1 and sda1 is the defective member in md0.

💡 In `[U_]`, `U` depicts an “Up” device and `_` depicts a “down” or “missing” device.

Another easy way would be to lookup devices with lsblk, but that will only work, if the device is still recognized by the os, which they sometimes are not.

If `/dev/sda` has gone bad:

fail and remove the disk from the md device

I always do this. Even if the disk I have to replace is not recognized by the os anymore. There’s never been issues with this approach, but I can at least provide anecdotal evidence for it’s help in the matter. 🤷

Fail the disk we want to replace:

mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md1 --fail /dev/sda2

Now actually remove the disk, you want to replace:

mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --remove /dev/sda2

You can then replace the faulty disk.

Once the disk is replaced and the server back up and running, continue with the next step:

Copy partition table from healthy disk to new, empty disk

Be careful with this next step! Take your time, check twice. If you mix up the device names, you end up nuking the partition table on the healthy disk. No bueno!

Copy partition table from healty disk (here: /dev/sdb) to new disk (here: /dev/sda):

sfdisk -d /dev/sdb | sfdisk /dev/sda

sfdisk -d dumps the partition table to stdout. The pipe hands it back into sfdisk on stdin, which then writes the partition table to the new disk.

Remove possibly existing superblocks

With this step we make sure that there are no superblocks on the new disk that contain residual data.

mdadm --zero-superblock /dev/sda1
mdadm --zero-superblock /dev/sda2

Add the new disk to the raid

mdadm -a /dev/md0 /dev/sda1
mdadm -a /dev/md1 /dev/sda2

Your disk should now start resilvering. You can check the status with cat /proc/mdstat.

Install grub-config on all disks

You never know, which disk fails, so it’s a good idea to have the boot information present on all of them, or booting might fail.

grub-install /dev/sda
grub-install /dev/sdb

Done.

Some notes on NVMe

NVMe drives are usually named /dev/nvme0n1, /dev/nvme1n1 etc.
NVMe partitions are usually named /dev/nvme0n1p1, /dev/nvme0n1p2, where the p marks the partition
Your OS might require a reboot to recognize the new partitions on a disk