[PVE-User] Debian 11 hard lock issues as VM
Eneko Lacunza
elacunza at binovo.es
Tue Jan 17 09:22:22 CET 2023
Hi Bryan,
We started to upgrade our cluster from PVE 7.2 to 7.3 yesterday.
I have enabled the agent in our only VM with Debian 11 running on a
7.3-4 node at the moment, and performed 5 full backups in a row, VM
continues working (no hang).
You haven't provided details about your setup:
- Server (especially CPU model). Debian could be suffering from weird
BIOS clock issues.
- Running kernel on PVE 7.3-4 . Kernel 5.15.x has been quite bad for us,
have you tried kernel 5.13 or 5.19?
Cheers
El 17/1/23 a las 6:06, Bryan Fields escribió:
> I am running proxmox 7.3-4 with a now Debian 11 VM.
>
> I have ZFS local storage in each server in the cluster. Every 15
> minutes the VM is replicated to the other server(s). Recently I've
> upgraded a server from Debian 9 to Debian 11 and it started locking
> up. This didn't seem to have a certain amount of time that it took to
> lockup, or a certain number of replications.
>
> Through some debugging I found this was the qemu-agent not unfreezing
> the OS after the replication. This should happen in under 100 ms is
> my understanding and from what I could see, it worked fine on all my
> other VM's with Ubuntu or RHEL.
>
> I compared the agent from the debian 11 server and the Ubuntu servers,
> and debian was 5.2.0 vs 6.2.0 on Ubuntu. I compiled the agent from
> the 7.2.0 qemu sources (statically too if anyone wants a copy) and ran
> it from screen on a terminal on the Debian 11 VM. This still locked up
> hard after 2-4 hours.
>
> Debian is using the stock kernel:
>> Linux eyes 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13)
>> x86_64 GNU/Linux
>
> I read some things online and thought it might be related to VirtIO,
> and changed that to VirtIO single with no difference.
>
> I've reverted back to the old kernel and am going to let this run.
> 4.9.0-19-amd64 #1 SMP Debian 4.9.320-2 (2022-06-30) x86_64 GNU/Linux
>
> Complicating this, the box is my observium install and I don't have
> another device watching it, so when it locks up, it takes my
> monitoring offline :-D
>
> On the working Ubuntu boxes I'm running:
>> 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64
>> x86_64 x86_64 GNU/Linux
>
> Below is the log where this locks up, and there's no more output after
> the last one (I have verbose enabled)
>
>> 1673846104.535376: debug: received EOF
>> 1673846104.635560: debug: received EOF
>> 1673846104.735735: debug: received EOF
>> 1673846104.835868: debug: received EOF
>> 1673846104.936067: debug: read data, count: 104, data:
>> {"execute":"guest-sync-delimited","arguments":{"id":371290701}}
>> {"arguments":{},"execute":"guest-ping"}
>>
>> 1673846104.936136: debug: process_event: called
>> 1673846104.936144: debug: processing command
>> 1673846104.936216: debug: sending data, count: 23
>> 1673846104.936257: debug: process_event: called
>> 1673846104.936272: debug: processing command
>> 1673846104.936350: debug: sending data, count: 15
>> 1673846104.936833: debug: received EOF
>> 1673846105.37003: debug: received EOF
>> 1673846105.137190: debug: received EOF
>> 1673846105.237344: debug: received EOF
>> 1673846105.337525: debug: received EOF
>> 1673846105.437693: debug: received EOF
>> 1673846105.537907: debug: received EOF
>> 1673846105.638096: debug: received EOF
>> 1673846105.738307: debug: received EOF
>> 1673846105.838495: debug: received EOF
>> 1673846105.938652: debug: received EOF
>> 1673846106.38813: debug: received EOF
>> 1673846106.139011: debug: received EOF
>> 1673846106.239210: debug: received EOF
>> 1673846106.339403: debug: received EOF
>> 1673846106.439583: debug: received EOF
>> 1673846106.539782: debug: received EOF
>> 1673846106.639990: debug: received EOF
>> 1673846106.740190: debug: received EOF
>> 1673846106.840388: debug: read data, count: 115, data:
>> {"arguments":{"id":371290702},"execute":"guest-sync-delimited"}
>> {"execute":"guest-fsfreeze-freeze","arguments":{}}
>>
>> 1673846106.840450: debug: process_event: called
>> 1673846106.840465: debug: processing command
>> 1673846106.840497: debug: sending data, count: 23
>> 1673846106.840545: debug: process_event: called
>> 1673846106.840563: debug: processing command
>> 1673846106.841114: debug: disabling command: guest-get-time
>> 1673846106.841131: debug: disabling command: guest-set-time
>> 1673846106.841138: debug: disabling command: guest-shutdown
>> 1673846106.841145: debug: disabling command: guest-file-open
>> 1673846106.841151: debug: disabling command: guest-file-close
>> 1673846106.841157: debug: disabling command: guest-file-read
>> 1673846106.841164: debug: disabling command: guest-file-write
>> 1673846106.841171: debug: disabling command: guest-file-seek
>> 1673846106.841179: debug: disabling command: guest-file-flush
>> 1673846106.841187: debug: disabling command: guest-fsfreeze-freeze
>> 1673846106.841194: debug: disabling command: guest-fsfreeze-freeze-list
>> 1673846106.841202: debug: disabling command: guest-fstrim
>> 1673846106.841209: debug: disabling command: guest-suspend-disk
>> 1673846106.841217: debug: disabling command: guest-suspend-ram
>> 1673846106.841225: debug: disabling command: guest-suspend-hybrid
>> 1673846106.841232: debug: disabling command:
>> guest-network-get-interfaces
>> 1673846106.841239: debug: disabling command: guest-get-vcpus
>> 1673846106.841245: debug: disabling command: guest-set-vcpus
>> 1673846106.841251: debug: disabling command: guest-get-disks
>> 1673846106.841257: debug: disabling command: guest-get-fsinfo
>> 1673846106.841265: debug: disabling command: guest-set-user-password
>> 1673846106.841272: debug: disabling command: guest-get-memory-blocks
>> 1673846106.841278: debug: disabling command: guest-set-memory-blocks
>> 1673846106.841286: debug: disabling command: guest-get-memory-block-info
>> 1673846106.841294: debug: disabling command: guest-exec-status
>> 1673846106.841303: debug: disabling command: guest-exec
>> 1673846106.841311: debug: disabling command: guest-get-host-name
>> 1673846106.841319: debug: disabling command: guest-get-users
>> 1673846106.841326: debug: disabling command: guest-get-timezone
>> 1673846106.841334: debug: disabling command: guest-get-osinfo
>> 1673846106.841343: debug: disabling command: guest-get-devices
>> 1673846106.841350: debug: disabling command:
>> guest-ssh-get-authorized-keys
>> 1673846106.841356: debug: disabling command:
>> guest-ssh-add-authorized-keys
>> 1673846106.841363: debug: disabling command:
>> guest-ssh-remove-authorized-keys
>> 1673846106.841371: warning: disabling logging due to filesystem freeze
>
>
> Other than disabling the agent, is there any reason this is hapening?
> I can't think that Debian 11 is shipping with a broken kernel, but the
> 'qm guest cmd 152 fsfreeze-freeze' and 'qm guest cmd 152
> fsfreeze-thaw' works fine from the host. Could this be something with
> the VirtIO pipe/IPC?
>
> Anyone else seeing this or have any ideas?
>
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-user
mailing list