[PVE-User] Debian 11 hard lock issues as VM

Bryan Fields Bryan at bryanfields.net
Tue Jan 17 06:06:52 CET 2023


I am running proxmox 7.3-4 with a now Debian 11 VM.

I have ZFS local storage in each server in the cluster.   Every 15 minutes the 
VM is replicated to the other server(s).  Recently I've upgraded a server from 
Debian 9 to Debian 11 and it started locking up.  This didn't seem to have a 
certain amount of time that it took to lockup, or a certain number of 
replications.

Through some debugging I found this was the qemu-agent not unfreezing the OS 
after the replication.  This should happen in under 100 ms is my understanding 
and from what I could see, it worked fine on all my other VM's with Ubuntu or 
RHEL.

I compared the agent from the debian 11 server and the Ubuntu servers, and 
debian was 5.2.0 vs 6.2.0 on Ubuntu.  I compiled the agent from the 7.2.0 qemu 
sources (statically too if anyone wants a copy) and ran it from screen on a 
terminal on the Debian 11 VM.  This still locked up hard after 2-4 hours.

Debian is using the stock kernel:
> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux  

I read some things online and thought it might be related to VirtIO, and 
changed that to VirtIO single with no difference.

I've reverted back to the old kernel and am going to let this run.
4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux

Complicating this, the box is my observium install and I don't have another 
device watching it, so when it locks up, it takes my monitoring offline :-D

On the working Ubuntu boxes I'm running:
> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Below is the log where this locks up, and there's no more output after the 
last one (I have verbose enabled)

> 1673846104.535376: debug: received EOF
> 1673846104.635560: debug: received EOF
> 1673846104.735735: debug: received EOF
> 1673846104.835868: debug: received EOF
> 1673846104.936067: debug: read data, count: 104, data: {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
> {"arguments":{},"execute":"guest-ping"}
> 
> 1673846104.936136: debug: process_event: called
> 1673846104.936144: debug: processing command
> 1673846104.936216: debug: sending data, count: 23
> 1673846104.936257: debug: process_event: called
> 1673846104.936272: debug: processing command
> 1673846104.936350: debug: sending data, count: 15
> 1673846104.936833: debug: received EOF
> 1673846105.37003: debug: received EOF
> 1673846105.137190: debug: received EOF
> 1673846105.237344: debug: received EOF
> 1673846105.337525: debug: received EOF
> 1673846105.437693: debug: received EOF
> 1673846105.537907: debug: received EOF
> 1673846105.638096: debug: received EOF
> 1673846105.738307: debug: received EOF
> 1673846105.838495: debug: received EOF
> 1673846105.938652: debug: received EOF
> 1673846106.38813: debug: received EOF
> 1673846106.139011: debug: received EOF
> 1673846106.239210: debug: received EOF
> 1673846106.339403: debug: received EOF
> 1673846106.439583: debug: received EOF
> 1673846106.539782: debug: received EOF
> 1673846106.639990: debug: received EOF
> 1673846106.740190: debug: received EOF
> 1673846106.840388: debug: read data, count: 115, data: {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
> {"execute":"guest-fsfreeze-freeze","arguments":{}}
> 
> 1673846106.840450: debug: process_event: called
> 1673846106.840465: debug: processing command
> 1673846106.840497: debug: sending data, count: 23
> 1673846106.840545: debug: process_event: called
> 1673846106.840563: debug: processing command
> 1673846106.841114: debug: disabling command: guest-get-time
> 1673846106.841131: debug: disabling command: guest-set-time
> 1673846106.841138: debug: disabling command: guest-shutdown
> 1673846106.841145: debug: disabling command: guest-file-open
> 1673846106.841151: debug: disabling command: guest-file-close
> 1673846106.841157: debug: disabling command: guest-file-read
> 1673846106.841164: debug: disabling command: guest-file-write
> 1673846106.841171: debug: disabling command: guest-file-seek
> 1673846106.841179: debug: disabling command: guest-file-flush
> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
> 1673846106.841202: debug: disabling command: guest-fstrim
> 1673846106.841209: debug: disabling command: guest-suspend-disk
> 1673846106.841217: debug: disabling command: guest-suspend-ram
> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
> 1673846106.841232: debug: disabling command: guest-network-get-interfaces
> 1673846106.841239: debug: disabling command: guest-get-vcpus
> 1673846106.841245: debug: disabling command: guest-set-vcpus
> 1673846106.841251: debug: disabling command: guest-get-disks
> 1673846106.841257: debug: disabling command: guest-get-fsinfo
> 1673846106.841265: debug: disabling command: guest-set-user-password
> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
> 1673846106.841294: debug: disabling command: guest-exec-status
> 1673846106.841303: debug: disabling command: guest-exec
> 1673846106.841311: debug: disabling command: guest-get-host-name
> 1673846106.841319: debug: disabling command: guest-get-users
> 1673846106.841326: debug: disabling command: guest-get-timezone
> 1673846106.841334: debug: disabling command: guest-get-osinfo
> 1673846106.841343: debug: disabling command: guest-get-devices
> 1673846106.841350: debug: disabling command: guest-ssh-get-authorized-keys
> 1673846106.841356: debug: disabling command: guest-ssh-add-authorized-keys
> 1673846106.841363: debug: disabling command: guest-ssh-remove-authorized-keys
> 1673846106.841371: warning: disabling logging due to filesystem freeze


Other than disabling the agent, is there any reason this is hapening?  I can't 
think that Debian 11 is shipping with a broken kernel, but the 'qm guest cmd 
152 fsfreeze-freeze' and 'qm guest cmd 152 fsfreeze-thaw' works fine from the 
host. Could this be something with the VirtIO pipe/IPC?

Anyone else seeing this or have any ideas?

-- 
Bryan Fields

727-409-1194 - Voice
http://bryanfields.net



More information about the pve-user mailing list