[PVE-User] Debian 11 hard lock issues as VM
Bryan Fields
Bryan at bryanfields.net
Tue Jan 17 06:06:52 CET 2023
I am running proxmox 7.3-4 with a now Debian 11 VM.
I have ZFS local storage in each server in the cluster. Every 15 minutes the
VM is replicated to the other server(s). Recently I've upgraded a server from
Debian 9 to Debian 11 and it started locking up. This didn't seem to have a
certain amount of time that it took to lockup, or a certain number of
replications.
Through some debugging I found this was the qemu-agent not unfreezing the OS
after the replication. This should happen in under 100 ms is my understanding
and from what I could see, it worked fine on all my other VM's with Ubuntu or
RHEL.
I compared the agent from the debian 11 server and the Ubuntu servers, and
debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from the 7.2.0 qemu
sources (statically too if anyone wants a copy) and ran it from screen on a
terminal on the Debian 11 VM. This still locked up hard after 2-4 hours.
Debian is using the stock kernel:
> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux
I read some things online and thought it might be related to VirtIO, and
changed that to VirtIO single with no difference.
I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux
Complicating this, the box is my observium install and I don't have another
device watching it, so when it locks up, it takes my monitoring offline :-D
On the working Ubuntu boxes I'm running:
> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Below is the log where this locks up, and there's no more output after the
last one (I have verbose enabled)
> 1673846104.535376: debug: received EOF
> 1673846104.635560: debug: received EOF
> 1673846104.735735: debug: received EOF
> 1673846104.835868: debug: received EOF
> 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
> {"arguments":{},"execute":"guest-ping"}
>
> 1673846104.936136: debug: process_event: called
> 1673846104.936144: debug: processing command
> 1673846104.936216: debug: sending data, count: 23
> 1673846104.936257: debug: process_event: called
> 1673846104.936272: debug: processing command
> 1673846104.936350: debug: sending data, count: 15
> 1673846104.936833: debug: received EOF
> 1673846105.37003: debug: received EOF
> 1673846105.137190: debug: received EOF
> 1673846105.237344: debug: received EOF
> 1673846105.337525: debug: received EOF
> 1673846105.437693: debug: received EOF
> 1673846105.537907: debug: received EOF
> 1673846105.638096: debug: received EOF
> 1673846105.738307: debug: received EOF
> 1673846105.838495: debug: received EOF
> 1673846105.938652: debug: received EOF
> 1673846106.38813: debug: received EOF
> 1673846106.139011: debug: received EOF
> 1673846106.239210: debug: received EOF
> 1673846106.339403: debug: received EOF
> 1673846106.439583: debug: received EOF
> 1673846106.539782: debug: received EOF
> 1673846106.639990: debug: received EOF
> 1673846106.740190: debug: received EOF
> 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
> {"execute":"guest-fsfreeze-freeze","arguments":{}}
>
> 1673846106.840450: debug: process_event: called
> 1673846106.840465: debug: processing command
> 1673846106.840497: debug: sending data, count: 23
> 1673846106.840545: debug: process_event: called
> 1673846106.840563: debug: processing command
> 1673846106.841114: debug: disabling command: guest-get-time
> 1673846106.841131: debug: disabling command: guest-set-time
> 1673846106.841138: debug: disabling command: guest-shutdown
> 1673846106.841145: debug: disabling command: guest-file-open
> 1673846106.841151: debug: disabling command: guest-file-close
> 1673846106.841157: debug: disabling command: guest-file-read
> 1673846106.841164: debug: disabling command: guest-file-write
> 1673846106.841171: debug: disabling command: guest-file-seek
> 1673846106.841179: debug: disabling command: guest-file-flush
> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
> 1673846106.841202: debug: disabling command: guest-fstrim
> 1673846106.841209: debug: disabling command: guest-suspend-disk
> 1673846106.841217: debug: disabling command: guest-suspend-ram
> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
> 1673846106.841232: debug: disabling command: guest-network-get-interfaces
> 1673846106.841239: debug: disabling command: guest-get-vcpus
> 1673846106.841245: debug: disabling command: guest-set-vcpus
> 1673846106.841251: debug: disabling command: guest-get-disks
> 1673846106.841257: debug: disabling command: guest-get-fsinfo
> 1673846106.841265: debug: disabling command: guest-set-user-password
> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
> 1673846106.841294: debug: disabling command: guest-exec-status
> 1673846106.841303: debug: disabling command: guest-exec
> 1673846106.841311: debug: disabling command: guest-get-host-name
> 1673846106.841319: debug: disabling command: guest-get-users
> 1673846106.841326: debug: disabling command: guest-get-timezone
> 1673846106.841334: debug: disabling command: guest-get-osinfo
> 1673846106.841343: debug: disabling command: guest-get-devices
> 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
> 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
> 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
> 1673846106.841371: warning: disabling logging due to filesystem freeze
Other than disabling the agent, is there any reason this is hapening? I can't
think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd
152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the
host. Could this be something with the VirtIO pipe/IPC?
Anyone else seeing this or have any ideas?
--
Bryan Fields
727-409-1194 - Voice
http://bryanfields.net
More information about the pve-user
mailing list