[pve-devel] [PATCH qemu-server 07/11] agent: implement fsfreeze helper to better handle lost commands

Mon May 5 16:01:11 CEST 2025

Am 05.05.25 um 15:57 schrieb Mira Limbeck:
> On 5/5/25 14:57, Fiona Ebner wrote:
>> diff --git a/PVE/QemuServer/Agent.pm b/PVE/QemuServer/Agent.pm
>> index 41e615aa..ef36a6a8 100644
>> --- a/PVE/QemuServer/Agent.pm
>> +++ b/PVE/QemuServer/Agent.pm
>> @@ -119,4 +119,55 @@ sub qemu_exec_status {
>>      return $res;
>>  }
>>  
>> +# It can happen that a guest agent command is read, but then the guest agent never sends an answer,
>> +# because the service in the guest is stopped/killed. For example, if a guest reboot happens before
>> +# the command can be successfully executed. This is usually not problematic, but the fsfreeze-freeze
>> +# command has a timeout of 1 hour, so the guest agent socket would be blocked for that amount of
>> +# time, waiting on a command that is not being executed anymore.
>> +#
>> +# Use a lower timeout for the fsfreeze-freeze command, and issue an fsfreeze-status command
>> +# afterwards, which will return immediately if the fsfreeze-freeze command already finished, and
>> +# which will be queued if not. This is used as a proxy to determine whether the fsfreeze-freeze
>> +# command is still running and to check whether it was successful. Like this, the time the socket is
>> +# blocked after a "lost command" is at most 10 minutes.
> With the changed logic it's not so clear at first how long it would wait
> in the worst case. It's still the same 60 minutes as before, just spread
> over the initial fsfreeze command, and then up to 5 iterations of
> fsfreese-status, each with a 10 minute timeout.
> 
> Maybe that could be documented in the comment and/or the commit message?

Sure, will add that in v2. The patch intentionally does not change the
time fsfreeze-freeze is allowed to take if it actually runs for that long.