FUN WITH LINUX

BTRFS: monitoring raid errors

16 February 2016

Btrfs has it’s builtin tool for displaying IO stats on devices:

root@tardis:# btrfs device stats /mnt/Raid/
[/dev/sdb].write_io_errs 0
[/dev/sdb].read_io_errs 0
[/dev/sdb].flush_io_errs 0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0

A simple cronjob could warn us if anything is wrong:

MAILTO=sysadmin@tardis.tdl
@hourly /sbin/btrfs device stats /mnt/Raid | grep -vE ' 0$'

It’s wisely to scrub the filesystem from time to time. Wikipedia says the following about “Data scrubbing”:

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, and then correct detected errors using redundant data in form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

If we don’t scrub, it could happen that BTRFS only reads from the good drive without detecting the faulty drive

@monthly /sbin/btrfs scrub start -Bq /mnt/Raid
[ Linux  Sysadmin  Btrfs  ]
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 Unported License.

Copyright 2015-present Hoti