[PVE-User] Proxmox and glusterfs: VMs get corupted

Tue May 30 18:46:51 CEST 2023

if /mnt/pve/gfs_vms is a writeable path from inside pve host, did you check if there is
also corruption when reading/writing large files there and compare with md5sum after copy ?

furthermore, i remember there was a gluster/qcow2 issue with aio=native some years ago,
could you retry with aio=threads for the virtual disks ?

regards
roland

Am 30.05.23 um 18:32 schrieb Christian Schoepplein:
> Hi,
>
> we are testing the current proxmox version with a glusterfs storage backend
> and have a strange issue with file getting corupted inside the virtual
> machines. For what reason ever from one moment to another binaries can not
> longer be executed, scripts are damaged and so on. In the logs I get errors
> like this:
>
> May 30 11:22:36 ns1 dockerd[1234]: time="2023-05-30T11:22:36.874765091+02:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: could not insert 'bridge': Exec format error\nmodprobe: ERROR: could not insert 'br_netfilter': Exec format error\ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \n, error: exit status 1"
>
> On such a broken system a file brings the following:
>
> root at ns1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
> /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: data
> root at ns1:~#
>
> On a normal system it looks like this:
>
> root at gluster1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
> /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: ELF 64-bit LSB
> relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=1084f7cfcffbd4c607724fba287c0ea7fc5775
> root at gluster1:~#
>
> there are not only kernel modules afected. I saw the same behaviour for
> scripts, icinga check modules, the sendmail binary and so on, I think it is
> totaly random :-(.
>
> We have the problems with newly installed VMs, VMs cloned from a template
> create on our proxmox host and with VMs which we used before with libvirtd
> and migrated to our new proxmox machine. So IMHO it can not be related to
> the way we create new virtual machines...
>
> We are using the following software:
>
> root at proxmox1:~# pveversion -v
> proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
> pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
> pve-kernel-5.15: 7.4-1
> pve-kernel-5.15.104-1-pve: 5.15.104-2
> pve-kernel-5.15.102-1-pve: 5.15.102-1
> ceph-fuse: 15.2.17-pve1
> corosync: 3.1.7-pve1
> criu: 3.15-1+pve-1
> glusterfs-client: 9.2-1
> ifupdown2: 3.1.0-1+pmx3
> ksm-control-daemon: 1.4-1
> libjs-extjs: 7.0.0-1
> libknet1: 1.24-pve2
> libproxmox-acme-perl: 1.4.4
> libproxmox-backup-qemu0: 1.3.1-1
> libproxmox-rs-perl: 0.2.1
> libpve-access-control: 7.4-2
> libpve-apiclient-perl: 3.2-1
> libpve-common-perl: 7.3-4
> libpve-guest-common-perl: 4.2-4
> libpve-http-server-perl: 4.2-3
> libpve-rs-perl: 0.7.5
> libpve-storage-perl: 7.4-2
> libspice-server1: 0.14.3-2.1
> lvm2: 2.03.11-2.1
> lxc-pve: 5.0.2-2
> lxcfs: 5.0.3-pve1
> novnc-pve: 1.4.0-1
> proxmox-backup-client: 2.4.1-1
> proxmox-backup-file-restore: 2.4.1-1
> proxmox-kernel-helper: 7.4-1
> proxmox-mail-forward: 0.1.1-1
> proxmox-mini-journalreader: 1.3-1
> proxmox-widget-toolkit: 3.6.5
> pve-cluster: 7.3-3
> pve-container: 4.4-3
> pve-docs: 7.4-2
> pve-edk2-firmware: 3.20230228-2
> pve-firewall: 4.3-1
> pve-firmware: 3.6-4
> pve-ha-manager: 3.6.0
> pve-i18n: 2.12-1
> pve-qemu-kvm: 7.2.0-8
> pve-xtermjs: 4.16.0-1
> qemu-server: 7.4-3
> smartmontools: 7.2-pve3
> spiceterm: 3.2-2
> swtpm: 0.8.0~bpo11+3
> vncterm: 1.7-1
> zfsutils-linux: 2.1.9-pve1
> root at proxmox1:~#
>
> root at proxmox1:~# cat /etc/pve/storage.cfg
> dir: local
>          path /var/lib/vz
>          content rootdir,iso,images,vztmpl,backup,snippets
>
> zfspool: local-zfs
>          pool rpool/data
>          content images,rootdir
>          sparse 1
>
> glusterfs: gfs_vms
>          path /mnt/pve/gfs_vms
>          volume gfs_vms
>          content images
>          prune-backups keep-all=1
>          server gluster1.linova.de
>          server2 gluster2.linova.de
>
> root at proxmox1:~#
>
> The config of a typical VM looks like this:
>
> root at proxmox1:~# cat /etc/pve/qemu-server/101.conf
> #ns1
> agent: enabled=1,fstrim_cloned_disks=1
> boot: c
> bootdisk: scsi0
> cicustom: user=local:snippets/user-data
> cores: 1
> hotplug: disk,network,usb
> ide2: gfs_vms:101/vm-101-cloudinit.qcow2,media=cdrom,size=4M
> ipconfig0: ip=10.200.32.9/22,gw=10.200.32.1
> kvm: 1
> machine: q35
> memory: 2048
> meta: creation-qemu=7.2.0,ctime=1683718002
> name: ns1
> nameserver: 10.200.0.5
> net0: virtio=1A:61:75:25:C6:30,bridge=vmbr0
> numa: 1
> ostype: l26
> scsi0: gfs_vms:101/vm-101-disk-0.qcow2,discard=on,size=10444M
> scsihw: virtio-scsi-pci
> searchdomain: linova.de
> serial0: socket
> smbios1: uuid=e2f503fe-4a66-4085-86c0-bb692add6b7a
> sockets: 1
> vmgenid: 3be6ec9d-7cfd-47c0-9f86-23c2e3ce5103
>
> root at proxmox1:~#
>
> Our glusterfs storage backend consists of three servers all running Ubuntu
> 22.04 and glusterfs version 10.1. There are no errors in the logs on the
> glusterfs hosts when a VM crashes and because some times also icinga plugins
> get corupted I do get a very exact time range to search in the logs for
> errors and warnings.
>
> However, I think it has something to do with our glusterfs setup. If I clone
> a VM from a template I get the following:
>
> root at proxmox1:~# qm clone 9000 200 --full --name testvm --description
> "testvm" --storage gfs_vms                                                                                                                                         [62/62]
> create full clone of drive ide2 (gfs_vms:9000/vm-9000-cloudinit.qcow2)
> Formatting
> 'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
> [2023-05-30 16:18:17.753152 +0000] I
> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
> [2023-05-30 16:18:17.876879 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:17.877606 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:17.878275 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:27.761247 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
> io-stats translator unloaded
> [2023-05-30 16:18:28.766999 +0000] I
> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
> [2023-05-30 16:18:28.936449 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0:
> All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:28.937547 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:28.938115 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:38.774387 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
> io-stats translator unloaded
> create full clone of drive scsi0 (gfs_vms:9000/base-9000-disk-0.qcow2)
> Formatting
> 'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=10951327744 lazy_refcounts=off refcount_bits=16
> [2023-05-30 16:18:39.962238 +0000] I
> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
> [2023-05-30 16:18:40.084300 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:40.084996 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:40.085505 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:49.970199 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
> io-stats translator unloaded
> [2023-05-30 16:18:50.975729 +0000] I
> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
> [2023-05-30 16:18:51.768619 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:51.769330 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:18:51.769822 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:19:00.984578 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
> io-stats translator unloaded
> transferred 0.0 B of 10.2 GiB (0.00%)
> [2023-05-30 16:19:02.030902 +0000] I
> [io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
> transferred 112.8 MiB of 10.2 GiB (1.08%)
> transferred 230.8 MiB of 10.2 GiB (2.21%)
> transferred 340.5 MiB of 10.2 GiB (3.26%)
> ...
> transferred 10.1 GiB of 10.2 GiB (99.15%)
> transferred 10.2 GiB of 10.2 GiB (100.00%)
> transferred 10.2 GiB of 10.2 GiB (100.00%)
> [2023-05-30 16:19:29.804006 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:19:29.804807 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:19:29.805486 +0000] E [MSGID: 108006]
> [afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
> [2023-05-30 16:19:32.044693 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
> io-stats translator unloaded
> root at proxmox1:~#
>
> Is this message about the subvolumes which are down normal or might this be
> the reason for our strange problems?
>
> I have no idea how to further debug the problem so any helping idea or hint
> would be great. Pleae let me also know if I can provide more infos regarding
> our setup.
>
> Ciao and thanks a lot,
>
>    Schoepp
>