[PVE-User] Repeated backup VM crashes to local NFS

Eneko Lacunza elacunza at binovo.es
Mon Feb 23 09:34:15 CET 2015


Hi all,

I'm seeing repeated backup VM crashes when backup destination storage is 
NFS in the local node:

- 3 proxmox nodes PVE 3.3-5
   - 2 nodes have 1 ssd OSD each (4c Xeon, 12 G RAM)
   - 1 node is a NUC just for voting in proxmox and ceph clusters
- One of the two OSD nodes has a HW 2x1TB RAID1 for backups
- The largest VM, with 150GB of storage and 100GB of backup footprint, 
is on a RBD (ceph) storage. It is using virtio for disk and e1000 for 
network (before it was virtio for network and we saw the same crashes)
- There is a backup work that backups all VMs on both proxmox nodes to 
the HW 2x1TB RAID via NFS
- Sometimes the backup work makes the large VM to crash. Tonight it was 
at about 33% . We haven't seen problems with any other VM, neither in 
local storage nor in RBD storage.

Backup work says VM is not running, and continues with the rest of the 
VMs without issues.

I don't see any problem in syslog, only that the tap device is disconnected:

[...syslog with rrd and munin messages removed...]
Feb 23 00:00:02 proxmox2 vzdump[693500]: INFO: Starting Backup of VM 102 
(qemu)
Feb 23 00:00:02 proxmox2 pmxcfs[2721]: [status] notice: received log
Feb 23 00:00:03 proxmox2 qm[693514]: <root at pam> update VM 102: -lock backup
Feb 23 00:00:03 proxmox2 pmxcfs[2721]: [status] notice: received log
Feb 23 00:01:32 proxmox2 pmxcfs[2721]: [status] notice: received log
Feb 23 00:02:47 proxmox2 pmxcfs[2721]: [status] notice: received log
Feb 23 00:35:06 proxmox2 pmxcfs[2721]: [dcdb] notice: data verification 
successful
Feb 23 00:47:22 proxmox2 kernel: vmbr0: port 5(tap102i0) entering 
disabled state
Feb 23 00:47:22 proxmox2 kernel: vmbr0: port 5(tap102i0) entering 
disabled state
Feb 23 00:47:22 proxmox2 vzdump[693500]: VM 102 qmp command failed - VM 
102 not running
Feb 23 00:47:23 proxmox2 ntpd[2644]: Deleting interface #21 tap102i0, 
fe80::d083:a9ff:fea1:b59c#123, interface stats: received=0, sent=0, 
dropped=0, active_time=916401 secs
Feb 23 00:47:23 proxmox2 vzdump[693500]: VM 102 qmp command failed - VM 
102 not running
Feb 23 00:47:23 proxmox2 ntpd[2644]: peers refreshed
Feb 23 00:47:31 proxmox2 vzdump[693500]: ERROR: Backup of VM 102 failed 
- VM 102 not running
Feb 23 00:47:31 proxmox2 vzdump[693500]: INFO: Starting Backup of VM 103 
(qemu)

NFS server is nfs-kernel-server 1:1.2.6-4

I've seen this kind of crashes about 3-4 times, also with PVE 3.2 .

Any idea what can be wrong?

I'm looking to move the VM to the other node because I think the problem 
is having the VM running on the same machine as the NFS server.

Thanks
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
       943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list