[pve-devel] [PATCH pve-manager] fix #3369: auto-start vms after failed pbs backup

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Apr 8 08:41:04 CEST 2021


On April 7, 2021 4:23 pm, Dylan Whyte wrote:
> Fixes an issue in which a VM fails to automatically restart after a
> failed stop-mode backup to pbs.
> 
> Signed-off-by: Dylan Whyte <d.whyte at proxmox.com>
> ---
>  PVE/VZDump.pm | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> Notes:
> 1. The 1sec time delay was needed, as the check to see if the VM is running
> was still true while this code was executed (although the vm was just
> about to stop)
> 
> 2. The previously used vm_status call just checks if a PID exists and
> returns true if so. This also returns true when the VM is in "prelauch"
> state, hence PVE::QemuServer::vmstatus was used to see the exact state
> and handle the situation accordingly. Otherwise, the VM gets stuck in
> prelauch state from time to time.
> 
> 
> diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
> index fb4c8bad..1bda1f15 100644
> --- a/PVE/VZDump.pm
> +++ b/PVE/VZDump.pm
> @@ -23,6 +23,7 @@ use PVE::VZDump::Common;
>  use PVE::VZDump::Plugin;
>  use PVE::Tools qw(extract_param split_list);
>  use PVE::API2Tools;
> +use PVE::QemuServer;
>  
>  my @posix_filesystems = qw(ext3 ext4 nfs nfs4 reiserfs xfs);
>  
> @@ -1039,10 +1040,17 @@ sub exec_backup_task {
>  		    debugmsg ('info', "resume vm", $logfd);
>  		    $plugin->resume_vm ($task, $vmid);
>  		} else {
> -		    my $running = $plugin->vm_status($vmid);
> -		    if (!$running) {
> +		    sleep(1);

I wonder where this second comes from? some kind of timeout in PBS code?

> +		    my $vmstatus = PVE::QemuServer::vmstatus($vmid, 1);

we don't know this is a VM?

> +		    my $stat = $vmstatus->{$vmid};
> +		    my $status = $stat->{qmpstatus};
> +
> +		    if ($status eq "stopped") {
> +	    		$plugin->start_vm ($task, $vmid);
> +    			debugmsg ('info', "restarting vm", $logfd);
> +		    } elsif ($status eq "prelaunch") {
> +			$plugin->resume_vm ($task, $vmid);

this can occur if the

- VM was runnning at the start of the backup, but with stop mode
- a problem occured while the VM is in the prelaunch state

normally, the qemu-server VZDump plugin handles resuming. but there are 
two 'die' statements in archive_pbs that can trigger before resuming 
happens, and restoring the power state does nothing if the VM is already 
running. so either of those two should be fixed to handle the prelaunch 
issue.

the prelaunch issue also seems to affect VMA, although it might be 
harder to reliably trigger an error during the initial backup start 
window.

>  			debugmsg ('info', "restarting vm", $logfd);
> -			$plugin->start_vm ($task, $vmid);
>  		    }
>  		}
>  		$self->run_hook_script ('post-restart', $task, $logfd);
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 





More information about the pve-devel mailing list