Recover Software Raid1 after HDD failure on Debian Squeeze
Starting configuration: HP Proliant ML150 G3 server.
- 2 x 500 GB SATA drives (Seagate ...) /dev/sda and /dev/sdb.
- 2 raid1 device - /dev/md0 (made of /dev/sda1 and /dev/sdb1) for system /dev/md1 for swap
One of the hdd crashed physically (/dev/sda). I wanted to upgrade to better and bigger hdd and got 2x 1TB Samsung.
I replaced the faulty disk and I booted the system from /dev/sdb (needed BIOS boot device reconfiguration) and then
- Clean the new HDD of all existing partitions (already clean since was new but this step should be considered if using a different hdd)
- Clean the MBR with: # dd if=/dev/zero of=/dev/sda bs=512 count=1
- Save partition scheme from good disk with : # sfdisk -d /dev/sdb > mbr_sdb.txt
- Apply the scheme to new hdd: # sfdisk /dev/sdb < mbr_sdb.txt
- Add the partitions of new disk to the raid: # mdadm /dev/md0 -a /dev/sda1
- Raid recovery starts and you check the status with: # mdadm --detail /dev/md0 (the same command for md1)
- installed grub on the new disk with: # grub-install /dev/sda
Then I removed the /dev/sdb (still functional but wanted 2 identical/better drives) with a new one and repeated the process.
Finally I wanted to install grub on the new disk (being /dev/sdb now) and it failed with a message that sai:
# grub-install /dev/sdb
/usr/sbin/grub-probe: error: no such disk.
Auto-detection of a file system of /dev/md0 failed.
Please report this together with the output of "/usr/sbin/grub-probe --device-map=/boot/grub/device.map --target=fs -v /boot/grub" to <bug-grub@gnu.org>
After digging on net I found and resolved the problem. The cause was the /boot/grub/device.map file content was not reflecting the reality of the disks.
The file had this content which show the old disks.
# cat /boot/grub/device.map
(hd0) /dev/disk/by-id/ata-ST3500320NS_9QM263BS
(hd1) /dev/disk/by-id/ata-ST3500630NS_9QG8J1M5
(hd2) /dev/disk/by-id/ata-ST3500320NS_9QM1BN3X
So, we need to update that file and that can be done by grub-install by adding the parametter --recheck (installing the grub in the same time)
root@wdev:/boot/grub# grub-install --recheck /dev/sdb
Installation finished. No error reported.
# cat /boot/grub/device.map
(hd0) /dev/disk/by-id/ata-SAMSUNG_HD103SJ_S246J9KB416143
(hd1) /dev/disk/by-id/ata-SAMSUNG_HD103SJ_S246J9KB416142
Watch raid sync progress with :
# watch -n 2 cat /proc/mdstat
Create a new raid1 device md2 with the remaining 500GB (I created new partitions /dev/sda3 and sdb3 to add to new array)
Create the array with:
# mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
Format the new partition ext4:
# mkfs.ext4 /dev/md2
Update the /etc/mdadm/mdadm.conf content adding the new array or regenerate its content using
# /usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf