[PVE-User] ZFS corruption and recovery...
Marco Gaiarin
gaio at lilliput.linux.it
Tue Mar 19 17:31:02 CET 2024
In a little PVE cluster i've a 'backup server', eg an old/reconditioned
server that do simply backup storage for other nodes: apart the rpool,
there's another pool, on slow HDD, used as data repository, mainly for
rsnapshot.
As a backup server, can be powered down without much effort; last week i
need an (unused) controller within, and so i've powered off, removed the
controller, powered on.
Saturday the backup pool start to complain for errors, and also disks/kernel
complain too. for all the four disk in pool. :-(
Looking at errors, the don't seems media errors, so i've powered off the
server, looked carefully at cabling finding that probably last week removing
the controller i've inadvertently 'loosen' a power connection on the
backpane of disks, damn me.
Reviewing cable worked as expected: server start, SMART on disks say the are
good, all work as expected.
After the server start, disks start to resilver, but some errors remain: a
dozen of files and dirs in 'Permanent error list'.
Because is a backup server, i've simply removed most of the errors, doing
some turns of 'zpool scrub' and 'zpool clear -F' leading to this situation:
root at svpve3:~# zpool status -v rpool-backup
pool: rpool-backup
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub in progress since Tue Mar 19 16:58:52 2024
3.89T scanned at 2.97G/s, 745G issued at 569M/s, 13.5T total
0B repaired, 5.39% done, 06:32:11 to go
config:
NAME STATE READ WRITE CKSUM
rpool-backup ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST8000VN004-3CP101_WWZ1MBA8 ONLINE 0 0 0
ata-ST8000VN004-3CP101_WWZ1Q7F1 ONLINE 0 0 0
ata-ST8000VN004-3CP101_WRQ0WQ44 ONLINE 0 0 0
ata-ST8000VN004-3CP101_WWZ1RFL5 ONLINE 0 0 0
cache
scsi-33001438037cd8921 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
rpool-backup:<0x63f218>
rpool-backup:<0x108d421>
/rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/FS/P/26-02-19
/rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/2012/mc
/rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/CD/mg2014/100HP507
apart the first two, other three are directory, that seems i cannot delete
anymore, errors is 'dir not empty' or 'Invalid exchange'.
How can i fix this errors?! As just stated, this is a backup server and so
loosing some files (knowing what file, of course!) it is not trouble...
Thanks.
--
Chissà perché quando si sbaglia numero il telefono non è mai occupato.
(Beppe Grillo)
More information about the pve-user
mailing list