[PVE-User] ZFS corruption and recovery...

Marco Gaiarin gaio at lilliput.linux.it
Tue Mar 19 17:31:02 CET 2024


In a little PVE cluster i've a 'backup server', eg an old/reconditioned
server that do simply backup storage for other nodes: apart the rpool,
there's another pool, on slow HDD, used as data repository, mainly for
rsnapshot.

As a backup server, can be powered down without much effort; last week i
need an (unused) controller within, and so i've powered off, removed the
controller, powered on.


Saturday the backup pool start to complain for errors, and also disks/kernel
complain too. for all the four disk in pool. :-(
Looking at errors, the don't seems media errors, so i've powered off the
server, looked carefully at cabling finding that probably last week removing
the controller i've inadvertently 'loosen' a power connection on the
backpane of disks, damn me.


Reviewing cable worked as expected: server start, SMART on disks say the are
good, all work as expected.

After the server start, disks start to resilver, but some errors remain: a
dozen of files and dirs in 'Permanent error list'.


Because is a backup server, i've simply removed most of the errors, doing
some turns of 'zpool scrub' and 'zpool clear -F' leading to this situation:

 root at svpve3:~# zpool status -v rpool-backup
   pool: rpool-backup
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 	corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 	entire pool from backup.
    see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
   scan: scrub in progress since Tue Mar 19 16:58:52 2024
 	3.89T scanned at 2.97G/s, 745G issued at 569M/s, 13.5T total
 	0B repaired, 5.39% done, 06:32:11 to go
 config:

	NAME                                 STATE     READ WRITE CKSUM
	rpool-backup                         ONLINE       0     0     0
	  raidz1-0                           ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1MBA8  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1Q7F1  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WRQ0WQ44  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1RFL5  ONLINE       0     0     0
	cache
	  scsi-33001438037cd8921             ONLINE       0     0     0

 errors: Permanent errors have been detected in the following files:

        rpool-backup:<0x63f218>
        rpool-backup:<0x108d421>
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/FS/P/26-02-19
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/2012/mc
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/CD/mg2014/100HP507


apart the first two, other three are directory, that seems i cannot delete
anymore, errors is 'dir not empty' or 'Invalid exchange'.


How can i fix this errors?! As just stated, this is a backup server and so
loosing some files (knowing what file, of course!) it is not trouble...


Thanks.

-- 
  Chissà perché quando si sbaglia numero il telefono non è mai occupato.
							(Beppe Grillo)





More information about the pve-user mailing list