Software-Raid can be a pain in the a$$. What if a disk has some kind of temporary failure, jumps out of it’s raid-array and creates is own raid-array(all the raid-informations are stored on the disk)? That’s bad, because then we have 2 raid-arrays with the same data(raid-fork). If we also use LVM2 things start to get really bad. Because then it might happen, that the “wrong” raid-array is automatically used at boot-time…
Anything bad happend to my software RAID(1). A disk was okay but I got the following errors:
Mar 18 20:49:12 hetz kernel: [466421.061376] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 20:49:12 hetz kernel: [466421.061410] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 84 10 10 08 00 00 02 00 Mar 18 20:49:12 hetz kernel: [466421.061556] sd 2:0:0:0: [sda] Unhandled error code Mar 18 20:49:12 hetz kernel: [466421.061574] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 20:49:12 hetz kernel: [466421.061607] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 04 10 10 08 00 00 02 00
I contacted my hosting-provider. They checked the disk and told me that it OKAY. Until then there wasn’t a problem. But because of this error, the disk jumped out of it’s raid-array and after I rebooted the server I had all Raid-Arrays twice!! And the really big problem was, that I also have an LVM Volume-Group.
Found duplicate PV O0Ju3vera1nHsBj8sXxLauJkwc5GCIoR: using /dev/md127 not /dev/md3
This is aweful. /dev/md3 should be in use instead of /dev/md127. /dev/md127 is the new forked raid-array. So I created a table to be sure which partition is in use and which raid is out of sync:
/dev/md0 -> swap /dev/md1 -> /boot /dev/md2 -> / /dev/md127 -> volg1 /dev/md3 -> OUT OF SYNC /dev/md124 -> OUT OF SYNC /dev/md125 -> OUT OF SYNC /dev/md126 -> OUT OF SYNC
I had to do the following:
- mdadm –manage –stop /dev/MD-RAID
- mdadm –zero-superblock /dev/DISK
- mdadm –add /dev/EXISTING-MD-RAID /dev/DISK
This isn’t complicated stuff, but it failed just on the first MD-RAID. Every time I tried to stop the raidarray, I received the error-message: “md-raid in use”. But mount and lsof showed me the opposit. After searching and shouting for a while, I found the solution: this raid was used as SWAP
So I just stopped using it as swap:
Now I was able to remove all OUT-OF-SYNC-Arrays, and add them to my working RAID. What a challange..