[PVE-User] Backup of one VM always fails

Frank Thommen f.thommen at dkfz-heidelberg.de
Fri Dec 4 09:30:57 CET 2020


> On Thursday, December 3, 2020 10:16 PM, Frank Thommen <f.thommen at dkfz-heidelberg.de> wrote:
> 
>>
>>
>> Dear all,
>>
>> on our PVE cluster, the backup of a specific VM always fails (which
>> makes us worry, as it is our GitLab instance). The general backup plan
>> is "back up all VMs at 00:30". In the confirmation email we see, that
>> the backup of this specific VM takes six to seven hours and then fails.
>> The error message in the overview table used to be:
>>
>> vma_queue_write: write error - Broken pipe
>>
>> With detailed log
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
>> 123: 2020-12-01 02:53:08 INFO: status = running
>> 123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
>> 123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio0'
>> 'ceph-rbd:vm-123-disk-0' 20G
>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio1'
>> 'ceph-rbd:vm-123-disk-2' 1000G
>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio2'
>> 'ceph-rbd:vm-123-disk-3' 2T
>> 123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
>> 123: 2020-12-01 02:53:09 INFO: ionice priority: 7
>> 123: 2020-12-01 02:53:09 INFO: creating archive
>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-02_53_08.vma.lzo'
>> 123: 2020-12-01 02:53:09 INFO: started backup task
>> 'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
>> 123: 2020-12-01 02:53:12 INFO: status: 0% (167772160/3294239916032),
>> sparse 0% (31563776), duration 3, read/write 55/45 MB/s
>> [... ecc. ecc. ...]
>> 123: 2020-12-01 09:42:14 INFO: status: 35%
>> (1170252365824/3294239916032), sparse 0% (26845003776), duration 24545,
>> read/write 59/56 MB/s
>> 123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error - Broken pipe
>> 123: 2020-12-01 09:42:14 INFO: aborting backup job
>> 123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed -
>> vma_queue_write: write error - Broken pipe
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Since lately (upgrade to the newest PVE release) it's
>>
>> VM 123 qmp command 'query-backup' failed - got timeout
>>
>> with log
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>> 123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
>> 123: 2020-12-03 03:29:00 INFO: status = running
>> 123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio0'
>> 'ceph-rbd:vm-123-disk-0' 20G
>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio1'
>> 'ceph-rbd:vm-123-disk-2' 1000G
>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio2'
>> 'ceph-rbd:vm-123-disk-3' 2T
>> 123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
>> 123: 2020-12-03 03:29:01 INFO: ionice priority: 7
>> 123: 2020-12-03 03:29:01 INFO: creating vzdump archive
>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-03_29_00.vma.lzo'
>> 123: 2020-12-03 03:29:01 INFO: started backup task
>> 'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
>> 123: 2020-12-03 03:29:01 INFO: resuming VM again
>> 123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s, read:
>> 94.7 MiB/s, write: 51.7 MiB/s
>> [... ecc. ecc. ...]
>> 123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h 36m 7s,
>> read: 57.3 MiB/s, write: 53.6 MiB/s
>> 123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-backup' failed
>>
>> -   got timeout
>>     123: 2020-12-03 09:22:57 INFO: aborting backup job
>>     123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-cancel'
>>     failed - unable to connect to VM 123 qmp socket - timeout after 5981 retries
>>     123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed - VM 123 qmp
>>     command 'query-backup' failed - got timeout
>>
>>
>> The VM has some quite big vdisks (20G, 1T and 2T). All stored in Ceph.
>> There is still plenty of space in Ceph.
>>
>> Can anyone give us some hint on how to investigate and debug this further?
> 
> Because it is a write error, maybe we should look at the backup destination.
> Maybe it is a network connection issue? Maybe something wrong with the host? Maybe the disk is full?
> Which storage are you using for backup? Can you show us the corresponding entry in /etc/pve/storage.cfg?


We are backing up to cephfs with still 8 TB or so free.

/etc/pve/storage.cfg is
------------
dir: local
         path /var/lib/vz
         content vztmpl,backup,iso

dir: data
         path /data
         content snippets,images,backup,iso,rootdir,vztmpl

cephfs: cephfs
         path /mnt/pve/cephfs
         content backup,vztmpl,iso
         maxfiles 5

rbd: ceph-rbd
         content images,rootdir
         krbd 0
         pool pve-pool1
------------

Frank

> 
> best regards, Arjen
> 


More information about the pve-user mailing list