PVE 7.2 unstability

Eneko Lacunza elacunza at binovo.es
Wed May 11 16:35:24 CEST 2022


Hi all,

Yesterday we upgraded a 5-node cluster to PVE 7.2 from PVE 7.1:

# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-1-pve: 5.15.35-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

We're seen since them some unstability with our VMs, some of them start 
consuming a full CPU core without explanation.

We have seen this issue only with Linux VMs, mostly Debian 9,10,11 (but 
that's the most common OS in out 80+ VMs). Issue happens with 1 core, 2 
cores and 4 cores.

This issue seems to be easily reproduced bulk-migrating VMs. Not all 
bulk-migrated VMs show the issue, but some do. We see the issue in not 
recently migrated VMs too.

Some of the VMs show a timelapse in syslog. For example, in our 
"release" VM:

May 10 15:11:18 release systemd[1]: Stopping User Runtime Directory 
/run/user/1003...
May 10 15:11:18 release systemd[1]: run-user-1003.mount: Succeeded.
May 10 15:11:18 release systemd[1]: user-runtime-dir at 1003.service: 
Succeeded.
May 10 15:11:18 release systemd[1]: Stopped User Runtime Directory 
/run/user/1003.
May 10 15:11:18 release systemd[1]: Removed slice User Slice of UID 1003.
Jan 15 06:42:04 release systemd[1]: Starting Daily apt download 
activities...
Jan 15 06:42:04 release mariadbd[453]: 850115  6:42:04 [ERROR] mysqld 
got signal 11 ;
Jan 15 06:42:04 release mariadbd[453]: This could be because you hit a 
bug. It is also possible that this binary
Jan 15 06:42:04 release mariadbd[453]: or one of the libraries it was 
linked against is corrupt, improperly built,
Jan 15 06:42:04 release mariadbd[453]: or misconfigured. This error can 
also be caused by malfunctioning hardware.
Jan 15 06:42:04 release mariadbd[453]: To report this bug, see 
https://mariadb.com/kb/en/reporting-bugs
Jan 15 06:42:04 release mariadbd[453]: We will try our best to scrape up 
some info that will hopefully help
Jan 15 06:42:04 release mariadbd[453]: diagnose the problem, but since 
we have already crashed,
Jan 15 06:42:04 release mariadbd[453]: something is definitely wrong and 
this may fail.
Jan 15 06:42:04 release mariadbd[453]: Server version: 
10.5.15-MariaDB-0+deb11u1
Jan 15 06:42:04 release mariadbd[453]: key_buffer_size=134217728
Jan 15 06:42:04 release mariadbd[453]: read_buffer_size=131072
Jan 15 06:42:04 release mariadbd[453]: max_used_connections=3
Jan 15 06:42:04 release mariadbd[453]: max_threads=153
Jan 15 06:42:04 release mariadbd[453]: thread_count=0
Jan 15 06:42:04 release mariadbd[453]: It is possible that mysqld could 
use up to
Jan 15 06:42:04 release mariadbd[453]: key_buffer_size + 
(read_buffer_size + sort_buffer_size)*max_threads = 467872 K  bytes of 
memory
Jan 15 06:42:04 release mariadbd[453]: Hope that's ok; if not, decrease 
some variables in the equation.
Jan 15 06:42:04 release mariadbd[453]: Thread pointer: 0x0
Jan 15 06:42:04 release mariadbd[453]: Attempting backtrace. You can use 
the following information to find out
Jan 15 06:42:04 release mariadbd[453]: where mysqld died. If you see no 
messages after this, something went
Jan 15 06:42:04 release mariadbd[453]: terribly wrong...
Jan 15 06:42:04 release mariadbd[453]: stack_bottom = 0x0 thread_stack 
0x49000
Jan 15 06:42:04 release systemd[1]: Starting Online ext4 Metadata Check 
for All Filesystems...
Jan 15 06:42:04 release systemd[1]: Starting Clean php session files...
Jan 15 06:42:04 release systemd[1]: Starting Cleanup of Temporary 
Directories...
Jan 15 06:42:04 release systemd[1]: Starting Rotate log files...
Jan 15 06:42:04 release systemd[1]: Starting Daily man-db regeneration...
Jan 15 06:42:04 release systemd[1]: e2scrub_all.service: Succeeded.
Jan 15 06:42:04 release systemd[1]: Finished Online ext4 Metadata Check 
for All Filesystems.
Jan 15 06:42:04 release systemd[1]: systemd-tmpfiles-clean.service: 
Succeeded.
Jan 15 06:42:04 release systemd[1]: Finished Cleanup of Temporary 
Directories.
Jan 15 06:42:04 release systemd[1]: phpsessionclean.service: Succeeded.
Jan 15 06:42:04 release systemd[1]: Finished Clean php session files.
Jan 15 06:42:04 release systemd[1]: apt-daily.service: Succeeded.
Jan 15 06:42:04 release systemd[1]: Finished Daily apt download activities.
Jan 15 06:42:04 release systemd[1]: Starting Daily apt upgrade and clean 
activities...
Jan 15 06:42:04 release systemd[1]: apt-daily-upgrade.service: Succeeded.
Jan 15 06:42:04 release systemd[1]: Finished Daily apt upgrade and clean 
activities.
Jan 15 06:42:04 release systemd[1]: Reloading The Apache HTTP Server.
Jan 15 06:42:04 release systemd[1]: Looping too fast. Throttling 
execution a little.
[...reset...]

Is anyone seeing this issue?

Those servers have AMD Ryzen procesors.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



More information about the pve-user mailing list