Replace drives with larger ones using mdadm

Mdadm has one nice feature which enables us to replace drives with larger capacity ones. We can replace 500GB drives with 2TB drives with no downtime, it only takes time. At the start we have following setup:

# lsblk 
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 465,8G  0 disk  
├─sda1             8:1    0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sda2             8:2    0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdb                8:16   0 465,8G  0 disk  
├─sdb1             8:17   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdb2             8:18   0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdc                8:32   0 465,8G  0 disk  
├─sdc1             8:33   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdc2             8:34   0 464,8G  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0     4T  0 lvm   /osm
sdd                8:48   0 465,8G  0 disk  
├─sdd1             8:49   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdd2             8:50   0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm

We proceed and replace one drive with a larger one.

# lsblk 
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 465,8G  0 disk  
├─sda1             8:1    0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sda2             8:2    0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdb                8:16   0 465,8G  0 disk  
├─sdb1             8:17   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdb2             8:18   0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdc                8:32   0 465,8G  0 disk  
├─sdc1             8:33   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdc2             8:34   0 464,8G  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0     4T  0 lvm   /osm
sdd                8:48   0   1,8T  0 disk

New empty drive is in the server, but we need to configure the partitions before adding it to raid array. Examine the old partition table.

# fdisk -l /dev/sda
Disk /dev/sda: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x590b9494

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1          2048   1953791   1951744   953M fd Linux raid autodetect
/dev/sda2       1953792 976771071 974817280 464,8G fd Linux raid autodetect

Create new partition table with more space for data, but with the same size for boot aprtition.

# fdisk /dev/sdd 

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x896edf9c.

Command (m for help): p
Disk /dev/sdd: 1,8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x896edf9c

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): 

Using default response p.
Partition number (1-4, default 1): 
First sector (2048-3907029167, default 2048): 
Last sector, +sectors or +size{K,M,G,T,P} (2048-3907029167, default 3907029167): 1953791

Created a new partition 1 of type 'Linux' and of size 953 MiB.

Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): fd
Changed type of partition 'Linux' to 'Linux raid autodetect'.

Command (m for help): n
Partition type
   p   primary (1 primary, 0 extended, 3 free)
   e   extended (container for logical partitions)
Select (default p): 

Using default response p.
Partition number (2-4, default 2): 
First sector (1953792-3907029167, default 1953792): 
Last sector, +sectors or +size{K,M,G,T,P} (1953792-3907029167, default 3907029167): 

Created a new partition 2 of type 'Linux' and of size 1,8 TiB.

Command (m for help): t
Partition number (1,2, default 2): 2
Hex code (type L to list all codes): fd

Changed type of partition 'Linux' to 'Linux raid autodetect'.

Command (m for help): p
Disk /dev/sdd: 1,8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x896edf9c

Device     Boot   Start        End    Sectors  Size Id Type
/dev/sdd1          2048    1953791    1951744  953M fd Linux raid autodetect
/dev/sdd2       1953792 3907029167 3905075376  1,8T fd Linux raid autodetect

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

We can examine the partition table to be sure we did it correctly.

root@hyperion:~# fdisk -l /dev/sdd
Disk /dev/sdd: 1,8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x896edf9c

Device     Boot   Start        End    Sectors  Size Id Type
/dev/sdd1          2048    1953791    1951744  953M fd Linux raid autodetect
/dev/sdd2       1953792 3907029167 3905075376  1,8T fd Linux raid autodetect

Take a look at lsblk, we see partitions but they are not in raid array yet.

# lsblk 
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 465,8G  0 disk  
├─sda1             8:1    0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sda2             8:2    0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdb                8:16   0 465,8G  0 disk  
├─sdb1             8:17   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdb2             8:18   0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdc                8:32   0 465,8G  0 disk  
├─sdc1             8:33   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdc2             8:34   0 464,8G  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0     4T  0 lvm   /osm
sdd                8:32   0   1,8T  0 disk  
├─sdd1             8:33   0   953M  0 part  
└─sdd2             8:34   0   1,8T  0 part 

We can now proceed with adding them to the raid arrays.

# mdadm --manage /dev/md0 --add /dev/sdd1
mdadm: added /dev/sdd1

# mdadm --manage /dev/md1 --add /dev/sdd2
mdadm: added /dev/sdd2

Smaller array is quickly finished while larger oen takes some more time

# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 
md1 : active raid5 sdd2[4] sdc2[2] sdb2[1] sda2[0]
      1461832704 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [>....................]  recovery =  0.4% (2084472/487277568) finish=73.7min speed=109709K/sec
      bitmap: 2/4 pages [8KB], 65536KB chunk

md0 : active raid1 sdd1[4] sdc1[2] sdb1[1] sda1[0]
      975296 blocks super 1.2 [4/4] [UUUU]

After that drive is added to raid array.

# lsblk 
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0 465,8G  0 disk  
├─sda1             8:1    0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sda2             8:2    0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdb                8:16   0 465,8G  0 disk  
├─sdb1             8:17   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdb2             8:18   0 464,8G  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdc                8:32   0 465,8G  0 disk  
├─sdc1             8:33   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdc2             8:34   0 464,8G  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0     4T  0 lvm   /osm
sdd                8:48   0   1,8T  0 disk  
├─sdd1             8:49   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdd2             8:50   0   1,8T  0 part  
  └─md1            9:1    0   1,4T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm

And the array if back from degraded state.

# mdadm -D /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Sun May  6 16:47:17 2018
        Raid Level : raid5
        Array Size : 1461832704 (1394.11 GiB 1496.92 GB)
     Used Dev Size : 487277568 (464.70 GiB 498.97 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Oct 19 14:40:25 2021
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : data:1
              UUID : 9b735c42:5691ab77:0e4f2403:f7d79bdd
            Events : 41507

    Number   Major   Minor   RaidDevice State
       7       8        2        0      active sync   /dev/sda2
       6       8       18        1      active sync   /dev/sdb2
       5       8       34        2      active sync   /dev/sdc2
       4       8       50        3      active sync   /dev/sdd2

Now repeat the same procedure for the other 3 drives, one by one, until all 4 are bigger drives.

# lsblk 
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                8:0    0   1,8T  0 disk  
├─sda1             8:1    0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sda2             8:2    0   1,8T  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdb                8:16   0   1,8T  0 disk  
├─sdb1             8:17   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdb2             8:18   0   1,8T  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdc                8:32   0   1,8T  0 disk  
├─sdc1             8:33   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdc2             8:34   0   1,8T  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm
sdd                8:48   0   1,8T  0 disk  
├─sdd1             8:49   0   953M  0 part  
│ └─md0            9:0    0 952,4M  0 raid1 /boot
└─sdd2             8:50   0   1,8T  0 part  
  └─md1            9:1    0   5,5T  0 raid5 
    ├─osmhr-swap 253:0    0     4G  0 lvm   [SWAP]
    ├─osmhr-root 253:1    0    50G  0 lvm   /
    └─osmhr-osm  253:2    0   1,2T  0 lvm   /osm

Now we have larger partitions, but raid array is still the same size.

# mdadm -D /dev/md1 | grep -e "Array Size" -e "Dev Size"
        Array Size : 1461832704 (1394.11 GiB 1496.92 GB)
     Used Dev Size : 487277568 (464.70 GiB 498.97 GB)

Let it grow, let it grow, let it grow

# mdadm --grow /dev/md1 --size max
mdadm: component size of /dev/md1 has been set to 1952406528K

Now we have the right sizes in mdadm

# mdadm -D /dev/md1 | grep -e "Array Size" -e "Dev Size"
        Array Size : 5857219584 (5585.88 GiB 5997.79 GB)
     Used Dev Size : 1952406528 (1861.96 GiB 1999.26 GB)

Only thing left to fix is LVM

# pvscan 
  PV /dev/md1   VG osmhr           lvm2 [1,36 TiB / <140,11 GiB free]
  Total: 1 [1,36 TiB] / in use: 1 [1,36 TiB] / in no VG: 0 [0   ]

Resize PV

# pvresize /dev/md1 
  Physical volume "/dev/md1" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Results

# pvscan 
  PV /dev/md1   VG osmhr           lvm2 [5,45 TiB / 4,23 TiB free]
  Total: 1 [5,45 TiB] / in use: 1 [5,45 TiB] / in no VG: 0 [0   ]

List LV

# lvscan 
  ACTIVE            '/dev/osmhr/swap' [4,00 GiB] inherit
  ACTIVE            '/dev/osmhr/root' [50,00 GiB] inherit
  ACTIVE            '/dev/osmhr/osm' [1,17 TiB] inherit

Extend LVM

lvextend -L2T /dev/osmhr/osm
  Size of logical volume osmhr/osm changed from 1,17 TiB (307200 extents) to 2,00 TiB (524288 extents).
  Logical volume osmhr/osm successfully resized.

Result

# lvscan 
  ACTIVE            '/dev/osmhr/swap' [4,00 GiB] inherit
  ACTIVE            '/dev/osmhr/root' [50,00 GiB] inherit
  ACTIVE            '/dev/osmhr/osm' [2,00 TiB] inherit

Resize file system

# resize2fs /dev/osmhr/osm
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/osmhr/osm is mounted on /osm; on-line resizing required
old_desc_blocks = 75, new_desc_blocks = 128
The filesystem on /dev/osmhr/osm is now 536870912 (4k) blocks long.

Now we have more free space to use

Enable https with redirect on osm-hr servers

It was time to enable https on all osm-hr servers, and Let’s Encrypt with certbot auto renewal was the best choice to do it. There are two steps in the process, install and configure certbot. Dependion on the OS first part may require additional steps.

Install

Debian 9

First you need to enable backports, edit /etc/apt/sources.list and add repo.

deb http://deb.debian.org/debian stretch-backports main

After than run update

apt-get update

Now you will have the option to install certbot

apt-get install certbot python-certbot-apache -t stretch-backports

Debian 10

No need for backports, just install certbot

apt-get install certbot python-certbot-apache

Ubuntu 18.04

Ubuntu alsop needs additional software and repo for certbot

apt-get update
apt-get install software-properties-common
add-apt-repository universe
add-apt-repository ppa:certbot/certbot
apt-get update

After the update certbot can be installed

apt-get install certbot python-certbot-apache

Configure

Configuration is the same

# certbot --apache
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator apache, Installer apache
Enter email address (used for urgent renewal and security notices)
(Enter 'c' to cancel): user@domain.org

Fill in the username email address for notification to continue

Please read the Terms of Service at https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must agree in order to register with the ACME server at https://acme-v02.api.letsencrypt.org/directory
(A)gree/(C)ancel: a

Read 🙂 and agree to Terms of Service to continue

Would you be willing to share your email address with the Electronic Frontier Foundation, a founding partner of the Let's Encrypt project and the non-profit organization that develops Certbot? We'd like to send you email about our work encrypting the web, EFF news, campaigns, and ways to support digital freedom.
(Y)es/(N)o: n

I already shared another email with them and receive the news there, no need for duplication.

Which names would you like to activate HTTPS for?
1: tms.osm-hr.org
Select the appropriate numbers separated by commas and/or spaces, or leave input blank to select all options shown (Enter 'c' to cancel): 1

Now we get the report what is done.

Obtaining a new certificate
Performing the following challenges:
http-01 challenge for tms.osm-hr.org
Enabled Apache rewrite module
Waiting for verification…
Cleaning up challenges
Created an SSL vhost at /etc/apache2/sites-available/tms-le-ssl.conf
Enabled Apache socache_shmcb module
Enabled Apache ssl module
Deploying Certificate to VirtualHost /etc/apache2/sites-available/tms-le-ssl.conf
Enabling available site: /etc/apache2/sites-available/tms-le-ssl.conf

Another prompt about redirection, you have to chose

Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
1: No redirect - Make no further changes to the webserver configuration.
2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for new sites, or if you're confident your site works on HTTPS. You can undo this change by editing your web server's configuration.
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2

After that Apache rewrite mode is enabled

Enabled Apache rewrite module
Redirecting vhost in /etc/apache2/sites-enabled/tms.conf to ssl vhost in /etc/apache2/sites-available/tms-le-ssl.conf
Congratulations! You have successfully enabled https://tms.osm-hr.org
You should test your configuration at:
https://www.ssllabs.com/ssltest/analyze.html?d=tms.osm-hr.org

Final notice

IMPORTANT NOTES:
- Congratulations! Your certificate and chain have been saved at:
/etc/letsencrypt/live/tms.osm-hr.org/fullchain.pem
Your key file has been saved at:
/etc/letsencrypt/live/tms.osm-hr.org/privkey.pem
Your cert will expire on 2019-09-10. To obtain a new or tweaked version of this certificate in the future, simply run certbot again with the "certonly" option. To non-interactively renew all of your certificates, run "certbot renew"
- Your account credentials have been saved in your Certbot configuration directory at /etc/letsencrypt. You should make a secure backup of this folder now. This configuration directory will also contain certificates and private keys obtained by Certbot so making regular backups of this folder is ideal.
- If you like Certbot, please consider supporting our work by:
Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate
Donating to EFF: https://eff.org/donate-le

That’s all folks, now you have web server with automatic redirect to https.

Replace disk in mdadm before it fails

There are 2 disks containing / and SWAP partition.

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 136.8G 0 disk
├─sda1 8:1 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sda2 8:2 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]
sdd 8:48 0 136.8G 0 disk
├─sdd1 8:49 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sdd2 8:50 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]

One disk started to act strange, utilization spiked and latency began to increase. After some investigation one thing stood out, that disk had continuously increasing Non-medium error count, by hundreds a minute. General opinion on the internet is that if that number continues to grow start looking for a replacement disk. Usually people get few hundreds of those, but we were getting those numbers per minute and we got to millions range.

# smartctl --all /dev/sdd | grep "Non-medium error count"
Non-medium error count: 7564451

Compare that to another disk, and you will see the difference.

# smartctl --all /dev/sda | grep "Non-medium error count"
Non-medium error count: 35

Mark the disk as failed.

# mdadm --manage /dev/md0 --fail /dev/sdd1
# mdadm --manage /dev/md1 --fail /dev/sdd2

Remove disk from configuration.

# mdadm --manage /dev/md0 --remove /dev/sdd1
# mdadm --manage /dev/md1 --remove /dev/sdd2

To be sure that we get the right disk out we need to locate it. Turn the led on, and off after we locate the drive.

# ledctl locate=/dev/sdd
# ledctl locate_off=/dev/sdd

Remove physical disk and replace it with the new one. Check lsblk to see new disk.

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 136.8G 0 disk
├─sda1 8:1 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sda2 8:2 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]
sdd 8:48 0 136.8G 0 disk

Copy partition table from sda to sdd

# sfdisk -d /dev/sda | sfdisk /dev/sdd

Partition table is copied to the new disk.

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 136.8G 0 disk
├─sda1 8:1 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sda2 8:2 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]
sdd 8:48 0 136.8G 0 disk
├─sdd1 8:49 0 130.4G 0 part
└─sdd2 8:50 0 6.4G 0 part

Add new disk to raid arrays, first SWAP because it’s smaller an it will rebuild quickly.

# mdadm --manage /dev/md1 --add /dev/sdd2

After that add / and let it rebuild while you finish rest of the checkups.

# mdadm --manage /dev/md0 --add /dev/sdd1

We can see the final state is the same as when we started.

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 136.8G 0 disk
├─sda1 8:1 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sda2 8:2 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]
sdd 8:48 0 136.8G 0 disk
├─sdd1 8:49 0 130.4G 0 part
│ └─md0 9:0 0 130.3G 0 raid1 /
└─sdd2 8:50 0 6.4G 0 part
└─md1 9:1 0 6.4G 0 raid1 [SWAP]

Difference is that we now have functioning disk with the same utilizationa and latency as sda, and no increase in Non-medium error count.

All this was done with zero downtime, because all the disks were in hot-swap drive bays. If the disk failed totally server would still run fine because SWAP was on raid1, which was intended configuration here.

LSI RAID controller enable/disable hotspare spindown

Depending on an use case sometimes you have to change hotspare disk behavior. In this case smartctl was monitoring all the disks behind hardware RAID, and one disk failed to read SMART Attribute Data.

# megaclisas-status
-- Controller information --
-- ID | H/W Model | RAM | Temp | BBU | Firmware
c0 | LSI MegaRAID SAS 9261-8i | 512MB | N/A | Good | FW: 12.15.0-0239

-- Array information --
-- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress
c0u0 | RAID-6 | 65491G | 64 KB | RA,WB | Default | Optimal | /dev/sda | None |None

-- Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID
c0u0p0 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 33C | [18:0] | 19
c0u0p1 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 33C | [18:1] | 24
c0u0p2 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 31C | [18:2] | 23
c0u0p3 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:3] | 25
c0u0p4 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:4] | 26
c0u0p5 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 32C | [18:5] | 29
c0u0p6 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:6] | 20
c0u0p7 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:7] | 27
c0u0p8 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 31C | [18:8] | 28
c0u0p9 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:9] | 22
c0u0p10 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:10] | 21

-- Unconfigured Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID | Path
c0uXpY | HDD | ST8000NM0055 | 7.276 TB | Hotspare, Spun down | 6.0Gb/s | 32C | [18:11] | 30 | N/A

It was determined that it was hotspare disk, and that it is spun down. Next to figure was the spin down default behavior? Based on your setup you may use storcli or megacli for this task.

# megacli -AdpGetProp -DsblSpinDownHSP -a0
Adapter 0: Spin Down of Hot Spares: Enabled
Exit Code: 0x00

Or

# storcli /c0 show ds
Controller = 0
Status = Success
Description = None
Controller Properties :
=====================
--------------------------
Ctrl_Prop Value
--------------------------
SpnDwnUncDrv Disabled
SpnDwnHS Enabled
SpnDwnTm 30 minute(s)

After consulting megacli and storecli documentation we can see how to do this.

megacli -AdpSetProp -DsblSpinDownHSP -val -aN|-a0,1,2|-aALL
val - 0= Spinning down the Hot Spare is enabled.
1= Spinning down the Hot Spare is disabled.

Or

storcli /cx set ds=off type=1|2|3|4

1: Unconfigured
2: Hot spare
3: Virtual drive
4: All

We can now disable hotspare spin down.

# megacli -AdpSetProp -DsblSpinDownHSP -1 -a0
Adapter 0: Set Disable spin Down of Hot Spares : success.
Exit Code: 0x00

Or

# storcli /c0 set ds=off type=2

We can check the controller again.

# megacli -AdpGetProp -DsblSpinDownHSP -a0
Adapter 0: Spin Down of Hot Spares: Disabled
Exit Code: 0x00

Or

# storcli /c0 show ds
Controller = 0
Status = Success
Description = None
Controller Properties :
=====================
--------------------------
Ctrl_Prop Value
--------------------------
SpnDwnUncDrv Disabled
SpnDwnHS Disabled
SpnDwnTm 30 minute(s)

Finally we can check it again withm megaclisas-status.

# megaclisas-status
-- Controller information --
-- ID | H/W Model | RAM | Temp | BBU | Firmware
c0 | LSI MegaRAID SAS 9261-8i | 512MB | N/A | Good | FW: 12.15.0-0239

-- Array information --
-- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress
c0u0 | RAID-6 | 65491G | 64 KB | RA,WB | Default | Optimal | /dev/sda | None |None

-- Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID
c0u0p0 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 33C | [18:0] | 19
c0u0p1 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 33C | [18:1] | 24
c0u0p2 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 31C | [18:2] | 23
c0u0p3 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:3] | 25
c0u0p4 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:4] | 26
c0u0p5 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 32C | [18:5] | 29
c0u0p6 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:6] | 20
c0u0p7 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:7] | 27
c0u0p8 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 31C | [18:8] | 28
c0u0p9 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 34C | [18:9] | 22
c0u0p10 | HDD | ST8000NM0055 | 7.276 TB | Online, Spun Up | 6.0Gb/s | 35C | [18:10] | 21

-- Unconfigured Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID | Path
c0uXpY | HDD | ST8000NM0055 | 7.276 TB | Hotspare, Spun Up | 6.0Gb/s | 32C | [18:11] | 30 | N/A

You can do the reverse just change the on/off or 0/1 values.

Extend LVM

Have additional ~130GB for upload, but not enough space on the “/osm” volume, as can be seen with pydf command.

# pydf
Filesystem Size Used Avail Use% Mounted on
/dev/osmhr/root 49G 6300M 41G 12.5 [####…………………………] /
/dev/md0 921M 155M 703M 16.8 [######………………………] /boot
/dev/osmhr/osm 837G 730G 106G 87.3 [##############################…] /osm

Let’s check available space on the drives.

# pvscan
PV /dev/md1 VG osmhr lvm2 [1,36 TiB / <490,11 GiB free]
Total: 1 [1,36 TiB] / in use: 1 [1,36 TiB] / in no VG: 0 [0 ]

We can see there is enough free space fo extend our volume, and we can check individual volumes.

# lvscan
ACTIVE '/dev/osmhr/swap' [4,00 GiB] inherit
ACTIVE '/dev/osmhr/root' [50,00 GiB] inherit
ACTIVE '/dev/osmhr/osm' [850,00 GiB] inherit

Decision is made to extend the volume from 850 Gib to 1000 GiB so we are clear for some time. First to stop services using that volume, and unmount it.

# systemctl stop apache2.service
# umount /osm

Now that the volume is free for manipulation it’s time to extend it

# lvextend -L1000G /dev/osmhr/osm
Size of logical volume osmhr/osm changed from 850,00 GiB (217600 extents) to 1000,00 GiB (256000 extents).
Logical volume osmhr/osm successfully resized.

After the volume resize it’s time to check file system

# e2fsck -f /dev/osmhr/osm
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/osmhr/osm: 10052064/55705600 files (0.1% non-contiguous), 194923328/222822400 blocks

With clean file system it’s safe to resize it

# resize2fs /dev/osmhr/osm
resize2fs 1.44.1 (24-Mar-2018)
Resizing the filesystem on /dev/osmhr/osm to 262144000 (4k) blocks.
The filesystem on /dev/osmhr/osm is now 262144000 (4k) blocks long.

To be safe you can check file system once more, now with new size

# e2fsck -f /dev/osmhr/osm
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/osmhr/osm: 10052064/65536000 files (0.1% non-contiguous), 195540128/262144000 blocks

Time to mount the volume back

# mount -a

And check the disk usage to see there is enough space for new data upload

# pydf
Filesystem Size Used Avail Use% Mounted on
/dev/osmhr/root 49G 6300M 41G 12.5 [####…………………………] /
/dev/md0 921M 155M 703M 16.8 [######………………………] /boot
/dev/osmhr/osm 984G 730G 254G 74.2 [#########################………] /osm

Available space on drives has shrinked

# pvscan
PV /dev/md1 VG osmhr lvm2 [1,36 TiB / <340,11 GiB free]
Total: 1 [1,36 TiB] / in use: 1 [1,36 TiB] / in no VG: 0 [0 ]

And our logical volume has inreased

# lvscan
ACTIVE '/dev/osmhr/swap' [4,00 GiB] inherit
ACTIVE '/dev/osmhr/root' [50,00 GiB] inherit
ACTIVE '/dev/osmhr/osm' [1000,00 GiB] inherit

Last thing, don’t forget to restart the services stopped earlier

# systemctl start apache2.service

Everything is back to normal, but there is lot more space for new data.