[PVE-User] Unresponsive VM(s) during VZdump

Mike O'Connor mike at oeg.com.au
Thu May 9 11:30:50 CEST 2024


Hi Iztok

You need to enable fleecing in the advanced backup settings. A slow 
backup storage system will cause this issue, configuring fleecing will 
fix this by storing changes in a local sparse image.

Mike

On 9/5/2024 6:05 pm, Iztok Gregori wrote:
> Hi to all!
>
> We are in the process of upgrading our Hyper-converged (Ceph based) 
> cluster from PVE 6 to PVE 8 and yesterday we finished upgrading all 
> nodes to PVE 7.4.1 without issues. Tonight, during our usual VZdump 
> backup (vzdump on NFS share), we were notified by our monitoring 
> system that 2 VMs (of 107) were unresponsive. In the VM logs there 
> were a lot of lines like this:
>
> kernel: hda: irq timeout: status=0xd0 { Busy }
> kernel: sd 2:0:0:0: [sda] abort
> kernel: sd 2:0:0:1: [sdb] abort
>
> After (successfully) finish the backup, the VM started to function 
> correctly again.
>
> On PVE 6 everything was ok.
>
> The affected machines are running old kernels "2.6.18" and "2.6.32", 
> one has qemu agent enabled the other has not. Both are using kvm64 as 
> processor type, one is using "Virtio Scsi" the other "LSI 53C895A". 
> All the disks are on Ceph RBD.
>
> No related logs were logged on the host machine, the Ceph cluster was 
> working as expected. Both VM are "biggish" 100-200GB and it takes 1/2 
> hours to complete the backup.
>
> Have you any idea what could be the culprit of the problem? I suspect 
> something with qemu-kvm, but I didn't find (yet) any usefull hints.
>
> I'm still planning to upgrade everything to PVE 8, maybe the "problem" 
> was fixed in later releases of qemu-kvm...
>
> I can give you more information if needed, any help is appreciated.
>
> Thanks
>   Iztok
>
> P.S This is the software stack on our cluster (16 nodes):
> # pveversion -v
> proxmox-ve: 7.4-1 (running kernel: 5.15.149-1-pve)
> pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
> pve-kernel-5.15: 7.4-12
> pve-kernel-5.4: 6.4-20
> pve-kernel-5.15.149-1-pve: 5.15.149-1
> pve-kernel-5.4.203-1-pve: 5.4.203-1
> pve-kernel-5.4.157-1-pve: 5.4.157-1
> pve-kernel-5.4.106-1-pve: 5.4.106-1
> ceph: 15.2.17-pve1
> ceph-fuse: 15.2.17-pve1
> corosync: 3.1.7-pve1
> criu: 3.15-1+pve-1
> glusterfs-client: 9.2-1
> ifupdown: 0.8.36+pve2
> ksm-control-daemon: 1.4-1
> libjs-extjs: 7.0.0-1
> libknet1: 1.24-pve2
> libproxmox-acme-perl: 1.4.4
> libproxmox-backup-qemu0: 1.3.1-1
> libproxmox-rs-perl: 0.2.1
> libpve-access-control: 7.4.3
> libpve-apiclient-perl: 3.2-2
> libpve-common-perl: 7.4-2
> libpve-guest-common-perl: 4.2-4
> libpve-http-server-perl: 4.2-3
> libpve-rs-perl: 0.7.7
> libpve-storage-perl: 7.4-3
> libqb0: 1.0.5-1
> libspice-server1: 0.14.3-2.1
> lvm2: 2.03.11-2.1
> lxc-pve: 5.0.2-2
> lxcfs: 5.0.3-pve1
> novnc-pve: 1.4.0-1
> proxmox-backup-client: 2.4.6-1
> proxmox-backup-file-restore: 2.4.6-1
> proxmox-kernel-helper: 7.4-1
> proxmox-mail-forward: 0.1.1-1
> proxmox-mini-journalreader: 1.3-1
> proxmox-offline-mirror-helper: 0.5.2
> proxmox-widget-toolkit: 3.7.3
> pve-cluster: 7.3-3
> pve-container: 4.4-6
> pve-docs: 7.4-2
> pve-edk2-firmware: 3.20230228-4~bpo11+3
> pve-firewall: 4.3-5
> pve-firmware: 3.6-6
> pve-ha-manager: 3.6.1
> pve-i18n: 2.12-1
> pve-qemu-kvm: 7.2.10-1
> pve-xtermjs: 4.16.0-2
> qemu-server: 7.4-5
> smartmontools: 7.2-pve3
> spiceterm: 3.2-2
> swtpm: 0.8.0~bpo11+3
> vncterm: 1.7-1
> zfsutils-linux: 2.1.15-pve1
>


More information about the pve-user mailing list