Proxmox and glusterfs: VMs get corupted

Tue May 30 18:32:11 CEST 2023

Hi,

we are testing the current proxmox version with a glusterfs storage backend 
and have a strange issue with file getting corupted inside the virtual 
machines. For what reason ever from one moment to another binaries can not 
longer be executed, scripts are damaged and so on. In the logs I get errors 
like this:

May 30 11:22:36 ns1 dockerd[1234]: time="2023-05-30T11:22:36.874765091+02:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: could not insert 'bridge': Exec format error\nmodprobe: ERROR: could not insert 'br_netfilter': Exec format error\ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \n, error: exit status 1"

On such a broken system a file brings the following:

root at ns1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: data
root at ns1:~#

On a normal system it looks like this:

root at gluster1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: ELF 64-bit LSB 
relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=1084f7cfcffbd4c607724fba287c0ea7fc5775
root at gluster1:~#

there are not only kernel modules afected. I saw the same behaviour for 
scripts, icinga check modules, the sendmail binary and so on, I think it is 
totaly random :-(.

We have the problems with newly installed VMs, VMs cloned from a template 
create on our proxmox host and with VMs which we used before with libvirtd 
and migrated to our new proxmox machine. So IMHO it can not be related to 
the way we create new virtual machines...

We are using the following software:

root at proxmox1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root at proxmox1:~#

root at proxmox1:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content rootdir,iso,images,vztmpl,backup,snippets

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        sparse 1

glusterfs: gfs_vms
        path /mnt/pve/gfs_vms
        volume gfs_vms
        content images
        prune-backups keep-all=1
        server gluster1.linova.de
        server2 gluster2.linova.de

root at proxmox1:~#

The config of a typical VM looks like this:

root at proxmox1:~# cat /etc/pve/qemu-server/101.conf
#ns1
agent: enabled=1,fstrim_cloned_disks=1
boot: c
bootdisk: scsi0
cicustom: user=local:snippets/user-data
cores: 1
hotplug: disk,network,usb
ide2: gfs_vms:101/vm-101-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=10.200.32.9/22,gw=10.200.32.1
kvm: 1
machine: q35
memory: 2048
meta: creation-qemu=7.2.0,ctime=1683718002
name: ns1
nameserver: 10.200.0.5
net0: virtio=1A:61:75:25:C6:30,bridge=vmbr0
numa: 1
ostype: l26
scsi0: gfs_vms:101/vm-101-disk-0.qcow2,discard=on,size=10444M
scsihw: virtio-scsi-pci
searchdomain: linova.de
serial0: socket
smbios1: uuid=e2f503fe-4a66-4085-86c0-bb692add6b7a
sockets: 1
vmgenid: 3be6ec9d-7cfd-47c0-9f86-23c2e3ce5103

root at proxmox1:~#

Our glusterfs storage backend consists of three servers all running Ubuntu 
22.04 and glusterfs version 10.1. There are no errors in the logs on the 
glusterfs hosts when a VM crashes and because some times also icinga plugins 
get corupted I do get a very exact time range to search in the logs for 
errors and warnings.

However, I think it has something to do with our glusterfs setup. If I clone 
a VM from a template I get the following:

root at proxmox1:~# qm clone 9000 200 --full --name testvm --description 
"testvm" --storage gfs_vms                                                                                                                                         [62/62]
create full clone of drive ide2 (gfs_vms:9000/vm-9000-cloudinit.qcow2)
Formatting 
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:17.753152 +0000] I 
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:17.876879 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.877606 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.878275 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:27.761247 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: 
io-stats translator unloaded
[2023-05-30 16:18:28.766999 +0000] I 
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:28.936449 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: 
All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.937547 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.938115 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:38.774387 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: 
io-stats translator unloaded
create full clone of drive scsi0 (gfs_vms:9000/base-9000-disk-0.qcow2)
Formatting 
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=10951327744 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:39.962238 +0000] I 
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:40.084300 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.084996 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.085505 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:49.970199 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: 
io-stats translator unloaded
[2023-05-30 16:18:50.975729 +0000] I 
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:51.768619 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769330 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769822 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:00.984578 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: 
io-stats translator unloaded
transferred 0.0 B of 10.2 GiB (0.00%)
[2023-05-30 16:19:02.030902 +0000] I 
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
transferred 112.8 MiB of 10.2 GiB (1.08%)
transferred 230.8 MiB of 10.2 GiB (2.21%)
transferred 340.5 MiB of 10.2 GiB (3.26%)
...
transferred 10.1 GiB of 10.2 GiB (99.15%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
[2023-05-30 16:19:29.804006 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.804807 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.805486 +0000] E [MSGID: 108006] 
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:32.044693 +0000] I [io-stats.c:4038:fini] 0-gfs_vms: 
io-stats translator unloaded
root at proxmox1:~#

Is this message about the subvolumes which are down normal or might this be 
the reason for our strange problems?

I have no idea how to further debug the problem so any helping idea or hint 
would be great. Pleae let me also know if I can provide more infos regarding 
our setup.

Ciao and thanks a lot,

  Schoepp