[pve-devel] pvedaemon hanging because of qga retry

Alexandre DERUMIER aderumier at odiso.com
Sun May 20 03:22:51 CEST 2018


I have notice something when agent daemon is down:

#qm agent 124 ping
VM 124 qmp command 'guest-ping' failed - got timeout
#qm agent 124 ping
VM 124 qmp command 'guest-ping' failed - got timeout
#qm agent 124 ping
VM 124 qmp command 'guest-ping' failed - got timeout
#qm agent 124 ping
VM 124 qmp command 'guest-ping' failed - unable to connect to VM 124 qga socket - timeout after 11 retries


Seem that after 3 request, we can't connect anymore to socket.
(I'm seeing same thing with socat directly to qga socket)


What I would like to have , to avoid big timeout (mainly for fsfreeze, this is the biggest with 1hour),
is to send first a guest-ping or maybe better guest-info, with a short timeout.
if it's succesfull, then send the other query.



 something like , for example vzdump


      if ($agent_running){
            eval { PVE::QemuServer::vm_mon_cmd($vmid, "guest-fsfreeze-freeze"); };
            if (my $err = $@) {
                $self->logerr($err);
            }
        }


--->

      if ($agent_running){
            eval { $res = PVE::QemuServer::vm_mon_cmd($vmid, "guest-info"); };
            if (my $err = $@) {
                $self->logerr($err);
            } elsif($res->{supported_commands}->{name}->{guest-fsfreeze-freeze} {

                    eval { PVE::QemuServer::vm_mon_cmd($vmid, "guest-fsfreeze-freeze"); };
                    if (my $err = $@) {
                          $self->logerr($err);
                    }
            }
        }


Like this, I think we could test if the command exist with guest-info, with a short timeout,
and after send the command with the bigger timeout.
(+ benefit to log error if the command don't exist)

Maybe create a specific sub to do it for any qga command.


what do you think about this?



----- Mail original -----
De: "dietmar" <dietmar at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 18 Mai 2018 19:03:19
Objet: Re: [pve-devel] pvedaemon hanging because of qga retry

> >>If you simply skip commands like 'guest-fsfreeze-thaw' 
> >>your VM will get totally unusable (frozen). So I am not 
> >>sure what you want to suggest? 
> 
> I'm not sure, but don't we have 2 timeout here ? 
> 
> 1 for connect , and 1 for command execution ? 

what for? 

> I would like to be able to fast timeout on connect, as if qga agent is not 
> running, it can't connect. 
> and if qga is running, keep the long executing timeout as it seem to be 
> needed by fsfreeze-fs. 

The problem is that there is no way to decide if qga agent is running or not. 
You will simply run into the 'short' timeout soon as there is some load on the 
server. 
AFAIK many users complained about that. 




More information about the pve-devel mailing list