[PVE-User] vzdump causing complete box hang

Gavin Henry gavin.henry at gmail.com
Sun Jul 29 11:39:03 CEST 2018


Hi all,

Once a week when we run vzdump on some database VMs we get the
following and the whole box hang and needs a cold remote reboot. Any
suggestions? We're seeing no other issues, no disk issues on the RAID,
no filesystem issues:

Jul 29 05:15:03 leela vzdump[280112]: <root at pam> starting task
UPID:leela:0004465A:038A102F:5B5D4D57:vzdump::root at pam:
Jul 29 05:15:03 leela vzdump[280154]: INFO: starting new backup job:
vzdump --mailnotification always --quiet 1 --mode snapshot --compress
lzo --storage Backups --all 1
Jul 29 05:15:03 leela pmxcfs[6094]: [status] notice: received log
Jul 29 05:15:03 leela vzdump[280154]: INFO: Starting Backup of VM 151 (openvz)
Jul 29 05:15:04 leela kernel: EXT3-fs: barriers disabled
Jul 29 05:15:04 leela kernel: kjournald starting. Commit interval 5 seconds
Jul 29 05:15:04 leela kernel: EXT3-fs (dm-2): using internal journal
Jul 29 05:15:04 leela kernel: ext3_orphan_cleanup: deleting
unreferenced inode 132645031

<snip>

Jul 29 05:15:04 leela kernel: EXT3-fs (dm-2): 45 orphan inodes deleted
Jul 29 05:15:04 leela kernel: EXT3-fs (dm-2): recovery complete
Jul 29 05:15:04 leela kernel: EXT3-fs (dm-2): mounted filesystem with
ordered data mode
Jul 29 05:17:01 leela /USR/SBIN/CRON[282355]: (root) CMD ( cd / &&
run-parts --report /etc/cron.hourly)
Jul 29 05:17:23 leela vzdump[280154]: INFO: Finished Backup of VM 151 (00:02:20)
Jul 29 05:17:24 leela vzdump[280154]: INFO: Starting Backup of VM 153 (openvz)
Jul 29 05:17:24 leela kernel: EXT3-fs: barriers disabled
Jul 29 05:17:24 leela kernel: kjournald starting. Commit interval 5 seconds
Jul 29 05:17:24 leela kernel: EXT3-fs (dm-2): using internal journal

<snip>

Jul 29 05:24:33 leela kernel: Buffer I/O error on device dm-2, logical
block 433061993
Jul 29 05:24:33 leela kernel: lost page write due to I/O error on dm-2
Jul 29 05:25:02 leela /USR/SBIN/CRON[291615]: (root) CMD (command -v
debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 29 05:28:59 leela kernel: EXT3-fs (dm-2): error: ext3_put_super:
Couldn't clean up the journal
Jul 29 05:29:00 leela vzdump[280154]: ERROR: Backup of VM 153 failed -
command '(cd /mnt/vzsnap0/private/153;find . '(' -regex '^\.$' ')' -o
'(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf -
--totals --sparse --numeric-owner --no-recursion --one-file-system
./etc/vzdump/vps.conf --null -T -|lzop)
>/mnt/replicated/backups/dump/vzdump-openvz-153-2018_07_29-05_17_24.tar.dat'
failed: exit code 2
Jul 29 05:29:00 leela vzdump[280154]: INFO: Starting Backup of VM 157 (openvz)
Jul 29 05:29:01 leela kernel: EXT3-fs: barriers disabled
Jul 29 05:29:01 leela kernel: kjournald starting. Commit interval 5 seconds
Jul 29 05:29:01 leela kernel: EXT3-fs (dm-2): using internal journal
Jul 29 05:29:01 leela kernel: ext3_orphan_cleanup: deleting
unreferenced inode 132645031

<snip>

Jul 29 05:29:01 leela kernel: ext3_orphan_cleanup: deleting
unreferenced inode 122814467
Jul 29 05:29:01 leela kernel: EXT3-fs (dm-2): 45 orphan inodes deleted
Jul 29 05:29:01 leela kernel: EXT3-fs (dm-2): recovery complete
Jul 29 05:29:01 leela kernel: EXT3-fs (dm-2): mounted filesystem with
ordered data mode
Jul 29 05:30:04 leela ansible-setup: Invoked with filter=*
gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Jul 29 05:32:05 leela ansible-setup: Invoked with filter=*
gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Jul 29 05:35:02 leela /USR/SBIN/CRON[304114]: (root) CMD (command -v
debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 29 05:35:03 leela kernel: device-mapper: snapshots: Invalidating
snapshot: Unable to allocate exception.

<snip>

Jul 29 05:35:48 leela kernel: JBD: I/O error detected when updating
journal superblock for dm-2.
Jul 29 05:35:54 leela kernel: EXT3-fs (dm-2): error:
ext3_journal_start_sb: Detected aborted journal
Jul 29 05:35:54 leela kernel: EXT3-fs (dm-2): error: remounting
filesystem read-only
Jul 29 05:36:40 leela kernel: EXT3-fs (dm-2): error: ext3_put_super:
Couldn't clean up the journal
Jul 29 05:36:41 leela vzdump[280154]: ERROR: Backup of VM 157 failed -
command '(cd /mnt/vzsnap0/private/157;find . '(' -regex '^\.$' ')' -o
'(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf -
--totals --sparse --numeric-owner --no-recursion --one-file-system
./etc/vzdump/vps.conf --null -T -|lzop)
>/mnt/replicated/backups/dump/vzdump-openvz-157-2018_07_29-05_29_00.tar.dat'
failed: exit code 2
Jul 29 05:36:41 leela vzdump[280154]: INFO: Starting Backup of VM 161 (openvz)
Jul 29 05:36:42 leela kernel: EXT3-fs: barriers disabled
Jul 29 05:36:42 leela kernel: kjournald starting. Commit interval 5 seconds
Jul 29 05:36:42 leela kernel: EXT3-fs (dm-2): using internal journal
Jul 29 05:35:42 leela kernel: EXT3-fs error (device dm-2):
ext3_get_inode_loc: unable to read inode block - inode=15024130,
block=60096514
Jul 29 05:35:42 leela kernel: Buffer I/O error on device dm-2, logical block 0
Jul 29 05:35:42 leela kernel: lost page write due to I/O error on dm-2
Jul 29 05:35:42 leela kernel: EXT3-fs (dm-2): I/O error while writing superblock
Jul 29 05:35:42 leela kernel: EXT3-fs (dm-2): error in
ext3_reserve_inode_write: IO failure
Jul 29 05:35:42 leela kernel: Buffer I/O error on device dm-2, logical block 0
Jul 29 05:35:42 leela kernel: lost page write due to I/O error on dm-2
Jul 29 05:35:42 leela kernel: EXT3-fs (dm-2): I/O error while writing superblock
Jul 29 05:35:43 leela kernel: EXT3-fs error (device dm-2):
ext3_get_inode_loc: unable to read inode block - inode=23347202,
block=93388802
Jul 29 05:35:43 leela kernel: Buffer I/O error on device dm-2, logical block 0
Jul 29 05:35:43 leela kernel: lost page write due to I/O error on dm-2
Jul 29 05:35:43 leela kernel: EXT3-fs (dm-2): I/O error while writing superblock
Jul 29 05:35:43 leela kernel: EXT3-fs (dm-2): error in
ext3_reserve_inode_write: IO failure
Jul 29 05:35:43 leela kernel: Buffer I/O error on device dm-2, logical block 0

They are getting dumped on to a glusterfs replicated set up which is
shared with another proxmox in the same rack in a two node cluster.

Thanks,
Gavin.


More information about the pve-user mailing list