[PVE-User] Debian 11 hard lock issues as VM

Eneko Lacunza elacunza at binovo.es
Tue Jan 17 09:22:22 CET 2023


Hi Bryan,

We started to upgrade our cluster from PVE 7.2 to 7.3 yesterday.

I have enabled the agent in our only VM with Debian 11 running on a 
7.3-4 node at the moment, and performed 5 full backups in a row, VM 
continues working (no hang).

You haven't provided details about your setup:

- Server (especially CPU model). Debian could be suffering from weird 
BIOS clock issues.

- Running kernel on PVE 7.3-4 . Kernel 5.15.x has been quite bad for us, 
have you tried kernel 5.13 or 5.19?

Cheers

El 17/1/23 a las 6:06, Bryan Fields escribió:
> I am running proxmox 7.3-4 with a now Debian 11 VM.
>
> I have ZFS local storage in each server in the cluster.   Every 15 
> minutes the VM is replicated to the other server(s).  Recently I've 
> upgraded a server from Debian 9 to Debian 11 and it started locking 
> up.  This didn't seem to have a certain amount of time that it took to 
> lockup, or a certain number of replications.
>
> Through some debugging I found this was the qemu-agent not unfreezing 
> the OS after the replication.  This should happen in under 100 ms is 
> my understanding and from what I could see, it worked fine on all my 
> other VM's with Ubuntu or RHEL.
>
> I compared the agent from the debian 11 server and the Ubuntu servers, 
> and debian was 5.2.0 vs 6.2.0 on Ubuntu.  I compiled the agent from 
> the 7.2.0 qemu sources (statically too if anyone wants a copy) and ran 
> it from screen on a terminal on the Debian 11 VM. This still locked up 
> hard after 2-4 hours.
>
> Debian is using the stock kernel:
>> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) 
>> x86_64 GNU/Linux 
>
> I read some things online and thought it might be related to VirtIO, 
> and changed that to VirtIO single with no difference.
>
> I've reverted back to the old kernel and am going to let this run.
> 4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux
>
> Complicating this, the box is my observium install and I don't have 
> another device watching it, so when it locks up, it takes my 
> monitoring offline :-D
>
> On the working Ubuntu boxes I'm running:
>> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 
>> x86_64 x86_64 GNU/Linux
>
> Below is the log where this locks up, and there's no more output after 
> the last one (I have verbose enabled)
>
>> 1673846104.535376: debug: received EOF
>> 1673846104.635560: debug: received EOF
>> 1673846104.735735: debug: received EOF
>> 1673846104.835868: debug: received EOF
>> 1673846104.936067: debug: read data, count: 104, data: 
>> {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
>> {"arguments":{},"execute":"guest-ping"}
>>
>> 1673846104.936136: debug: process_event: called
>> 1673846104.936144: debug: processing command
>> 1673846104.936216: debug: sending data, count: 23
>> 1673846104.936257: debug: process_event: called
>> 1673846104.936272: debug: processing command
>> 1673846104.936350: debug: sending data, count: 15
>> 1673846104.936833: debug: received EOF
>> 1673846105.37003: debug: received EOF
>> 1673846105.137190: debug: received EOF
>> 1673846105.237344: debug: received EOF
>> 1673846105.337525: debug: received EOF
>> 1673846105.437693: debug: received EOF
>> 1673846105.537907: debug: received EOF
>> 1673846105.638096: debug: received EOF
>> 1673846105.738307: debug: received EOF
>> 1673846105.838495: debug: received EOF
>> 1673846105.938652: debug: received EOF
>> 1673846106.38813: debug: received EOF
>> 1673846106.139011: debug: received EOF
>> 1673846106.239210: debug: received EOF
>> 1673846106.339403: debug: received EOF
>> 1673846106.439583: debug: received EOF
>> 1673846106.539782: debug: received EOF
>> 1673846106.639990: debug: received EOF
>> 1673846106.740190: debug: received EOF
>> 1673846106.840388: debug: read data, count: 115, data: 
>> {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
>> {"execute":"guest-fsfreeze-freeze","arguments":{}}
>>
>> 1673846106.840450: debug: process_event: called
>> 1673846106.840465: debug: processing command
>> 1673846106.840497: debug: sending data, count: 23
>> 1673846106.840545: debug: process_event: called
>> 1673846106.840563: debug: processing command
>> 1673846106.841114: debug: disabling command: guest-get-time
>> 1673846106.841131: debug: disabling command: guest-set-time
>> 1673846106.841138: debug: disabling command: guest-shutdown
>> 1673846106.841145: debug: disabling command: guest-file-open
>> 1673846106.841151: debug: disabling command: guest-file-close
>> 1673846106.841157: debug: disabling command: guest-file-read
>> 1673846106.841164: debug: disabling command: guest-file-write
>> 1673846106.841171: debug: disabling command: guest-file-seek
>> 1673846106.841179: debug: disabling command: guest-file-flush
>> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
>> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
>> 1673846106.841202: debug: disabling command: guest-fstrim
>> 1673846106.841209: debug: disabling command: guest-suspend-disk
>> 1673846106.841217: debug: disabling command: guest-suspend-ram
>> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
>> 1673846106.841232: debug: disabling command: 
>> guest-network-get-interfaces
>> 1673846106.841239: debug: disabling command: guest-get-vcpus
>> 1673846106.841245: debug: disabling command: guest-set-vcpus
>> 1673846106.841251: debug: disabling command: guest-get-disks
>> 1673846106.841257: debug: disabling command: guest-get-fsinfo
>> 1673846106.841265: debug: disabling command: guest-set-user-password
>> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
>> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
>> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
>> 1673846106.841294: debug: disabling command: guest-exec-status
>> 1673846106.841303: debug: disabling command: guest-exec
>> 1673846106.841311: debug: disabling command: guest-get-host-name
>> 1673846106.841319: debug: disabling command: guest-get-users
>> 1673846106.841326: debug: disabling command: guest-get-timezone
>> 1673846106.841334: debug: disabling command: guest-get-osinfo
>> 1673846106.841343: debug: disabling command: guest-get-devices
>> 1673846106.841350: debug: disabling command: 
>> guest-ssh-get-authorized-keys
>> 1673846106.841356: debug: disabling command: 
>> guest-ssh-add-authorized-keys
>> 1673846106.841363: debug: disabling command: 
>> guest-ssh-remove-authorized-keys
>> 1673846106.841371: warning: disabling logging due to filesystem freeze
>
>
> Other than disabling the agent, is there any reason this is hapening?  
> I can't think that Debian 11 is shipping with a broken kernel, but the 
> 'qm guest cmd 152 fsfreeze-freeze' and 'qm guest cmd 152 
> fsfreeze-thaw' works fine from the host. Could this be something with 
> the VirtIO pipe/IPC?
>
> Anyone else seeing this or have any ideas?
>

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


More information about the pve-user mailing list