RAID – Mark Failure and Replace Drive

I have wanted to get this posted for a while but have been busy with SANS FOR500 material, work, etc.

What I try to do when transferring my old notes to the blog is to go out and work through the steps first, correcting my notes as I step through them.  With this post, I have not done that because of the time it would take to setup and run through the steps.  But as I always warn, these are notes, not full instructions.  They get you in the ball park but you have to find the bases yourself.

So here we go…

This posting assumes raid and drive layout of this earlier post. Some steps below also refer to this post.

Software RAID 5 with UEFI/GPT via Ubuntu installer – Ubuntu Server 18.04

It might be best to set the efibootmgr to a partition not on the affected drive in case a reboot happens.  See steps 7-10 from the post above.

Check on drive state (and other useful items):

cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

Since in this case there are 3 raid arrays, mark the appropriate drive in all 3 arrays as failed and for removal (in this case, sde).

Mark failure:

mdadm --fail /dev/md0 /dev/sde2
mdadm --fail /dev/md1 /dev/sde3
mdadm --fail /dev/md2 /dev/sde4

Mark for removal:

mdadm --remove /dev/md0 /dev/sde2
mdadm --remove /dev/md1 /dev/sde3
mdadm --remove /dev/md2 /dev/sde4

Once drive is replaced,  re-add drive back into array:

mdadm --add /dev/md0 /dev/sde2
mdadm --add /dev/md1 /dev/sde3
mdadm --add /dev/md2 /dev/sde4

Watch rebuild status:

cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

Go to the link at the beginning of this post and do steps 7-10 if needed.