RAID – Mark Failure and Replace Drive

I have wanted to get this posted for a while but have been busy with SANS FOR500 material, work, etc.

What I try to do when transferring my old notes to the blog is to go out and work through the steps first, correcting my notes as I step through them.  With this post, I have not done that because of the time it would take to setup and run through the steps.  But as I always warn, these are notes, not full instructions.  They get you in the ball park but you have to find the bases yourself.

So here we go…

This posting assumes raid and drive layout of this earlier post. Some steps below also refer to this post.

Software RAID 5 with UEFI/GPT via Ubuntu installer – Ubuntu Server 18.04

It might be best to set the efibootmgr to a partition not on the affected drive in case a reboot happens.  See steps 7-10 from the post above.

Check on drive state (and other useful items):

cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

Since in this case there are 3 raid arrays, mark the appropriate drive in all 3 arrays as failed and for removal (in this case, sde).

Mark failure:

mdadm --fail /dev/md0 /dev/sde2
mdadm --fail /dev/md1 /dev/sde3
mdadm --fail /dev/md2 /dev/sde4

Mark for removal:

mdadm --remove /dev/md0 /dev/sde2
mdadm --remove /dev/md1 /dev/sde3
mdadm --remove /dev/md2 /dev/sde4

Once drive is replaced,  re-add drive back into array:

mdadm --add /dev/md0 /dev/sde2
mdadm --add /dev/md1 /dev/sde3
mdadm --add /dev/md2 /dev/sde4

Watch rebuild status:

cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

Go to the link at the beginning of this post and do steps 7-10 if needed.

Software RAID 5 with UEFI/GPT via Ubuntu installer – Ubuntu Server 18.04

NOTE:  You must use Ubuntu Server 18.04 LTS, not Ubuntu 18.04-live-server. The live server ISO does not provide all of the utilities for installing RAID and LVM.  (http://cdimage.ubuntu.com/releases/18.04/release/)

These steps also work for ubuntu Server 16.04 LTS.

These steps describe how to set up a software RAID 5 at installation time  using the Ubuntu Server ncurses installer.  The drive sizes here reflect my testing efforts on a VM, but I have implemented this on hardware where the “data” partition is ~45T in size.  I have tested the ability to remove a drive from the array and reboot the system, which survived the reboot.  However, the UEFI boot menu was modified by the system.  Not sure why.  I need to read more on various things taking place in this configuration.  But it’s working and appears to be stable thus far.  I’m sure a few “improvements” could be made.  All in time.

** Being a newb with WP, I may have screwed something up trying to format the page. Most of these notes should be used as reference, not as exact instructions anyway. **
Step 1
In Virtualbox:

Setup new VM.
Check box to allow UEFI.
Add 4 additional disks the same size as the first one.
Start VM to begin installing.
Select desired options until you arrive at disk partitioning.

Step 2
Create the physical disk partitions

At “Partition disks” screen:

Choose "Manual"

At this point, we will configure the first disk (sda):

Choose sda and create new partition.
Select "Free space"
Select "Create a new partition"

The first partition will be the GPT boot partition:

Select "Free Space"
Select "Create a new partition"
600 MB (at least 512 MB)
Select "Beginning"
Select "Use as"
Select "EFI System Partition"
Select "Done setting up partition"

The second partition will be the “/” partition:

Select "Free Space"
Select "Create a new partition"
10 GB (whatever you need)
Select "Beginning"
Select "Use as"
Select "EXT4"
Select "Mount point"
Select "/ - the root file system"
Select "Done setting up partition"

The third partition will be the “/data” partition:

Select "Free Space"
Select "Create a new partition"
9 GB (whatever you need)
Select "Beginning"
Select "Use as"
Select "EXT4"
Select "Mount point"
Select "Enter manually"
Enter "/data"
Select "Done setting up partition"

The swap partition:

Select "Free Space"
Select "Create a new partition"
2 GB (whatever you need; basically the remaining space)
Select "Beginning"
Select "Use as"
Select "swap area"
Select "Done setting up partition"

Now repeat all of these setup items again for each remaining disk:
sdb, sdc, sdd, sde

Step 3
Configure the software RAID 5

Select "Configure software RAID"
Select "Write changes"

**Important** When creating the RAID devices, DO NOT create a RAID device for the GPT boot partitions!

Select "Create MD device"
Select "Raid 5"
Change number of devices to "5" (or whatever the number of disks you are using)
Change number of spare devices to "0" (or whatever the number of spares you have)

I am setting raid up for the following:
“/”, “/data”, and “swap” – so 3 independent raid arrays.

Select partitions for the "/" partition (sda2, sdb2, sdc2, sdd2, sde2)
Select yes to write changes - if you are sure.

Now repeat the steps again for the remaining raid arrays (i.e. “/data”, and “swap”)

Select "Finish"

Step 4
Enable the RAID 5 arrays

Select "#1" under "RAID5 device #0"
Select "Use as"
Select "Ext4"
Select "Mount point"
Select "/"
Select "Done setting up the partition"

Do these steps for the remaining two raid devices (making sure you choose the right mount points/file systems).

Select "Finish partitioning and write changes to disk"
Select "yes" to write changes.

Step 5
Continue with the OS install.

Reboot
Login
As root, update: apt-get update && apt-get dist-upgrade
Reboot

Step 6
Lets look at what we have so far, making sure the arrays are healthy and done initializing.

As root:

cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

Step 7
What partition did we boot from?

cat /etc/fstab (look for something similar to these 2 lines:)
# /boot/efi was on /dev/sda1 during installation
UUID=CD43-AEF0 /boot/efi vfat umask=0077 0 1

Next:

blkid

Then:

efibootmgr -v

Match the UUID from fstab to a UUID in the blkid output. Then match the PARTUUID to the output from efibootmgr -v.

Step 8
The boot information is currently only on one disk (see Step 7). We need to copy this information to all disks so that we can survive a single drive failure. The following commands can destroy everything done thus far. Make sure you get it right. Snapshoting if running a VM is a good idea.

dd if=/dev/sda1 of=/dev/sdb1
dd if=/dev/sda1 of=/dev/sdc1
dd if=/dev/sda1 of=/dev/sdd1
dd if=/dev/sda1 of=/dev/sde1

Step 9
Now we add all of the boot partitions to the efibootmgr.
This command will show you what you currently have in the setup.

efibootmgr

We want to setup entries like “Boot0007* ubuntu”.

efibootmgr -c -g -d /dev/sdb -p 1 -L "ubuntu2" -l '\EFI\ubuntu\grubx64.efi'
efibootmgr -c -g -d /dev/sdc -p 1 -L "ubuntu3" -l '\EFI\ubuntu\grubx64.efi'
efibootmgr -c -g -d /dev/sdd -p 1 -L "ubuntu4" -l '\EFI\ubuntu\grubx64.efi'
efibootmgr -c -g -d /dev/sde -p 1 -L "ubuntu5" -l '\EFI\ubuntu\grubx64.efi'

Step 10
If you want to test the boot menu setup, do the following for each drive:

efibootmgr
efibootmgr -n 000N (where N is the drive to test as specified by running efibootmgr)
systemctl reboot

After reboot, run efibootmgr to verify (it booted so you must be ok).

 

See also:

RAID – Mark Failure and Replace Drive