[pve-devel] [PATCH qemu-server v2] Catch qmp socket connections errors, so we can output a more specific error message

Fabian Grünbichler f.gruenbichler at proxmox.com
Mon Jul 31 15:00:35 CEST 2017


On Thu, Jul 27, 2017 at 11:25:41AM +0200, Emmanuel Kasper wrote:
> It can happen that the qmp connection gets lost while mirroring a disk.
> In that case the current block job get cancelled, but the real cause of the failure
> is lost, becase we die() at a later step with the generic message
> "die "$job: mirroring has been cancelled\n"

I am not quite sure I can follow.. see below

> 
> example:
> ...
> drive-scsi0: transferred: 5524946944 bytes remaining: 918355968 bytes total: 6443302912 bytes progression: 85.75 % busy: 1 ready: 0
> drive-scsi0: Cancelling block job
> drive-scsi0: Done.
> 2017-07-26 15:39:56 ERROR: online migrate failure - mirroring error: drive-scsi0: mirroring has been cancelled
> 2017-07-26 15:39:56 aborting phase 2 - cleanup resources
> 2017-07-26 15:39:56 migrate_cancel
> ...

but this must be from dying in line 6054 (caught by the eval in 6030),
not from dying in line 6036? which means that query-block-jobs maybe
returned an empty array (or undef?)..

> 
> after patch applied:
> 2017-07-27 09:43:37 ERROR: online migrate failure - mirroring error: lost connection to qemu machine protocol: VM 600 not running
> 2017-07-27 09:43:37 aborting phase 2 - cleanup resources

but this would mean vm_qmp_command (called by vm_mon_cmd) died in line
4798, because check_running returned false??

I'd rather fix check_running returning false then, because obviously the
VM IS running isn't it? ;)

> ---
> changes since v1:
>  * declare and assign my $stats directly. No need to have three lines here
>  when one is clear enough
>  PVE/QemuServer.pm | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 1f34101..3086375 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -6033,7 +6033,8 @@ sub qemu_drive_mirror_monitor {
>  	while (1) {
>  	    die "storage migration timed out\n" if $err_complete > 300;
>  
> -	    my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> +	    my $stats = eval { vm_mon_cmd($vmid, "query-block-jobs"); };
> +	    die "lost connection to qemu machine protocol socket: $@\n" if $@;
>  
>  	    my $running_mirror_jobs = {};
>  	    foreach my $stat (@$stats) {
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




More information about the pve-devel mailing list