[pve-devel] [PATCH qemu-server 3/4] move finish_tunnel to after the VM is stopped

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Mar 11 15:22:00 CET 2020


On March 11, 2020 2:26 pm, Fabian Grünbichler wrote:
> On March 11, 2020 11:44 am, Mira Limbeck wrote:
>> The NBD socket on the client side is kept open until the VM stops. In
>> the case of TCP this was not a problem, because we did not rely on it
>> closing. But with unix socket forwarding via SSH the SSH connection
>> can't close if the socket is still open.
>> 
>> If it does not close by itself, it will be terminated after 30 seconds.
> 
> in which combinations did you test this? it prints pretty ugly warnings 
> here for a VM with a single, local disk:

okay, so this seems to be caused by some changes related to 4.2, which 
only get exposed easier with this patch set. anyway, the following PoC 
seems to get rid of them, and actually seems like a good idea in general 
(we want to cancel the ready block jobs and not switch over to the 
target/NBD disks!), but there might be some more error handling required 
to get it right in all cases (and a better name for the parameter ;)).

we could check all call sites and maybe change $skipcomplete to be an 
enum?

0 => wait until ready, complete
1 => wait until ready, return
2 => wait until ready, cancel

with a possible re-ordering in case migration is the only thing setting 
skipcomplete currently ;)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index a52746c..f42dee1 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -990,7 +990,7 @@ sub phase3_cleanup {
 
     if ($self->{storage_migration}) {
 	# finish block-job
-	eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}); };
+	eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}, 0, 0, 1); };
 
 	if (my $err = $@) {
 	    eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3d3505c..3c93a0a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6356,7 +6356,7 @@ sub qemu_drive_mirror {
 }
 
 sub qemu_drive_mirror_monitor {
-    my ($vmid, $vmiddst, $jobs, $skipcomplete, $qga) = @_;
+    my ($vmid, $vmiddst, $jobs, $skipcomplete, $qga, $cancelcomplete) = @_;
 
     eval {
 	my $err_complete = 0;
@@ -6426,12 +6426,13 @@ sub qemu_drive_mirror_monitor {
 
 		    last;
 		} else {
+		    $cancelcomplete //= 0;
 
 		    foreach my $job (keys %$jobs) {
 			# try to switch the disk if source and destination are on the same guest
 			print "$job: Completing block job...\n";
 
-			eval { mon_cmd($vmid, "block-job-complete", device => $job) };
+			eval { mon_cmd($vmid, $cancelcomplete ? "block-job-cancel" : "block-job-complete", device => $job) };
 			if ($@ =~ m/cannot be completed/) {
 			    print "$job: Block job cannot be completed, try again.\n";
 			    $err_complete++;

> 
> new to old (TCP, but with stop before tunnel close):
> 
> 2020-03-11 12:52:21 migration status: completed
> drive-scsi0: transferred: 17179869184 bytes remaining: 0 bytes total: 17179869184 bytes progression: 100.00 % busy: 0 ready: 1
> all mirroring jobs are ready
> drive-scsi0: Completing block job...
> drive-scsi0: Completed successfully.
> drive-scsi0 : finished
> VM quit/powerdown failed - terminating now with SIGTERM
> VM still running - terminating now with SIGKILL
> 2020-03-11 12:52:38 migration finished successfully (duration 00:00:46)
> 
> I haven't checked/verified, but I think the problem is that the source 
> VM still has the NBD connection open, and when you call 'quit', it will 
> attempt to flush the NBD connection and block on that (I ran into a 
> similar problem with my websocket tunnel).
> 
> new to new (UNIX, stop before tunnel close):
> 
> 2020-03-11 12:51:25 migration status: completed
> drive-scsi0: transferred: 17179869184 bytes remaining: 0 bytes total: 17179869184 bytes progression: 100.00 % busy: 0 ready: 1
> all mirroring jobs are ready
> drive-scsi0: Completing block job...
> drive-scsi0: Completed successfully.
> drive-scsi0 : finished
> channel 4: open failed: connect failed: open failed
> channel 3: open failed: connect failed: open failed
> channel 3: open failed: connect failed: open failed
> channel 3: open failed: connect failed: open failed
> channel 3: open failed: connect failed: open failed
> 2020-03-11 12:51:43 migration finished successfully (duration 00:01:22)
> 
> if I interpret the strace correctly, here's what happens:
> 
> 1. nbd connection gets closed from remote end (not directly visible in 
> strace, since that happens with another ssh call)
> 2. local VM connects again to the UNIX socket (it should be paused at 
> this moment?)
> 3. ssh accept()s connection and attempts to forward it
> 4. other end does not connect
> 5. error is returned to source node
> 
> 3 is the SSH control channel, 4 and 5 are the UNIX sockets on the source 
> node (nbd and memory), 6,7,8 are STDIN, STDOUT, STDERR, 9 and 10 are the 
> accept-ed() client connections for the UNIX sockets (nbd and memory). 
> note the shutdown and close for 10 and 9, followed by another accept on 
> 4 with failure to connect, followed by the resume command via the tunnel 
> and its reply.
> 
> select(11, [3 4 5 6 9 10], [3], NULL, {tv_sec=300, tv_usec=0}) = 2 (in [10], out [3], left {tv_sec=299, tv_usec=999998})
> read(10, "ffer\", \"size\" : 21 }, { \"name\" :"..., 16384) = 3100
> write(3, "\0\0@\20\337\271Q\240\6DI\267]u\301\330\345\2632\341j\240\223\341Z\202\350r\220\261Q\304"..., 16412) = 16412
> getpid()                                = 1540056
> select(11, [3 4 5 6 9 10], [3], NULL, {tv_sec=300, tv_usec=0}) = 2 (in [10], out [3], left {tv_sec=299, tv_usec=999998})
> read(10, "", 16384)                     = 0
> shutdown(10, SHUT_RD)                   = 0
> write(3, "\0\0\f0]\335J\351\327A\343\215\f\244x\35\227\362!\253u\225\222xSQ\312%f\202G\270"..., 3132) = 3132
> getpid()                                = 1540056
> select(11, [3 4 5 6 9], [3], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [3], left {tv_sec=299, tv_usec=999998})
> write(3, "\0\0\0\20O\32\347\372\227'\177\6\340\353\314\317X\10=\245\323\266nC]\341\272\376", 28) = 28
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=999404})
> read(3, "\0\0\0\20\346Ra\334\n\354\252\304\230\340\3\270\177\260;\275\373\353\221\"\7\234\270\221", 8192) = 28
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=997529})
> read(3, "\0\0\0\20\370u<I\215\222`U\235F\342\271SZbH\364\245\177\351\252\334e\333\0\0\0\20"..., 8192) = 56
> shutdown(10, SHUT_WR)                   = 0
> getpid()                                = 1540056
> close(10)                               = 0
> select(11, [3 4 5 6 9], [3], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [3], left {tv_sec=299, tv_usec=999997})
> write(3, "\0\0\0\20\37\213\3445(<d\vc\241\350Y\206Zn\220\261l\320\246\362\256\fn", 28) = 28
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [9], left {tv_sec=299, tv_usec=963873})
> read(9, "%`\225\23\0\0\0\3\0\0\177&\203)_\0\0\0\0\0\0\0\0\0\0\0\0\0", 16384) = 28
> getpid()                                = 1540056
> select(11, [3 4 5 6 9], [3], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [3], left {tv_sec=299, tv_usec=999998})
> write(3, "\0\0\0000\225\275c\363\257\336\356E\310h6\335\307\363&\257\272\2377\5\35&O\264\332]\317\343"..., 60) = 60
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=999731})
> read(3, "\0\0\0 \300\35\241Vbyzb\274\317y\1\237\f\vam\314\232\233\360\17x\364\354\331TO"..., 8192) = 44
> select(11, [3 4 5 6 9], [9], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [9], left {tv_sec=299, tv_usec=999998})
> write(9, "gDf\230\0\0\0\0\0\0\177&\203)_\0", 16) = 16
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [9], left {tv_sec=299, tv_usec=999508})
> read(9, "%`\225\23\0\0\0\3\0\0\177&\203)_\0\0\0\0\0\0\0\0\0\0\0\0\0", 16384) = 28
> getpid()                                = 1540056
> select(11, [3 4 5 6 9], [3], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [3], left {tv_sec=299, tv_usec=999998})
> write(3, "\0\0\0000\316\366\232\227\225\374\336I\226l\257P\317\257D\230\0\\\266\201\232+\201\20u\0215V"..., 60) = 60
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=999855})
> read(3, "\0\0\0 \204J\273p\231\17\331\252\312\366^\207\243\252\246\236\304\300N\376\1\314\265\27\231\35iH"..., 8192) = 44
> select(11, [3 4 5 6 9], [9], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [9], left {tv_sec=299, tv_usec=999998})
> write(9, "gDf\230\0\0\0\0\0\0\177&\203)_\0", 16) = 16
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=298, tv_usec=495094})
> read(3, "\0\0\0\20\365\264\371\351\311.\367\242,\357\241\265\376\376\227\rj+\312\222\223\344\316\252", 8192) = 28
> shutdown(9, SHUT_WR)                    = 0
> select(11, [3 4 5 6 9], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [9], left {tv_sec=299, tv_usec=999963})
> read(9, "", 16384)                      = 0
> shutdown(9, SHUT_RD)                    = 0
> getpid()                                = 1540056
> getpid()                                = 1540056
> select(11, [3 4 5 6], [3], NULL, {tv_sec=300, tv_usec=0}) = 2 (in [4], out [3], left {tv_sec=299, tv_usec=999998})
> accept(4, {sa_family=AF_UNIX}, [128->2]) = 10
> fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
> fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
> fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
> fcntl(10, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
> getsockname(10, {sa_family=AF_UNIX, sun_path="/run/qemu-server/12341234_nbd.migrate"}, [128->40]) = 0
> getsockname(10, {sa_family=AF_UNIX, sun_path="/run/qemu-server/12341234_nbd.migrate"}, [128->40]) = 0
> getpeername(10, {sa_family=AF_UNIX}, [128->2]) = 0
> getpeername(10, {sa_family=AF_UNIX}, [128->2]) = 0
> getpid()                                = 1540056
> write(3, "\0\0\0\20\371\31\34\304\377\210\22\213\356~\323h\367\217\3250ZK\354\31\21 at o\257\0\0\0\20"..., 180) = 180
> select(11, [3 4 5 6], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=999600})
> read(3, "\0\0\0000\226:m8\370!\251\32\224\274\v\375\267\221-\226`\277\271\206\350\313\324\312\252?\260\364"..., 8192) = 88
> write(2, "channel 4: open failed: connect "..., 53) = 53
> close(9)                                = 0
> munmap(0x7f06cdafc000, 1101824)         = 0
> close(10)                               = 0
> select(11, [3 4 5 6], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [6], left {tv_sec=299, tv_usec=991723})
> read(6, "resume 12341234\n", 16384)     = 16
> getpid()                                = 1540056
> select(11, [3 4 5 6], [3], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [3], left {tv_sec=299, tv_usec=999997})
> write(3, "\0\0\0 \f\221\305P\266\317j\2454\377#\266\32$\212\234/\244\267a\366\262\336\366\250\221\354B"..., 44) = 44
> select(11, [3 4 5 6], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [3], left {tv_sec=299, tv_usec=990758})
> read(3, "\0\0\0 D\7\336\260 \177C#\320\230p\327\235a\244\302z\31\354i\270\314\357e{\304~S"..., 8192) = 44
> select(11, [3 4 5 6], [7], NULL, {tv_sec=300, tv_usec=0}) = 1 (out [7], left {tv_sec=299, tv_usec=999997})
> write(7, "OK\n", 3)                     = 3
> select(11, [3 4 5 6], [], NULL, {tv_sec=300, tv_usec=0}) = 1 (in [4], left {tv_sec=299, tv_usec=996422})
> 
> old to new works fine ;)
> 
>> 
>> Signed-off-by: Mira Limbeck <m.limbeck at proxmox.com>
>> ---
>>  PVE/QemuMigrate.pm | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>> 
>> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> index 44e4c57..e569a2c 100644
>> --- a/PVE/QemuMigrate.pm
>> +++ b/PVE/QemuMigrate.pm
>> @@ -1035,15 +1035,6 @@ sub phase3_cleanup {
>>  	}
>>      }
>>  
>> -    # close tunnel on successful migration, on error phase2_cleanup closed it
>> -    if ($tunnel) {
>> -	eval { finish_tunnel($self, $tunnel);  };
>> -	if (my $err = $@) {
>> -	    $self->log('err', $err);
>> -	    $self->{errors} = 1;
>> -	}
>> -    }
>> -
>>      eval {
>>  	my $timer = 0;
>>  	if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && $self->{running}) {
>> @@ -1065,6 +1056,15 @@ sub phase3_cleanup {
>>  	$self->{errors} = 1;
>>      }
>>  
>> +    # close tunnel on successful migration, on error phase2_cleanup closed it
>> +    if ($tunnel) {
>> +	eval { finish_tunnel($self, $tunnel);  };
>> +	if (my $err = $@) {
>> +	    $self->log('err', $err);
>> +	    $self->{errors} = 1;
>> +	}
>> +    }
>> +
>>      # always deactivate volumes - avoid lvm LVs to be active on several nodes
>>      eval {
>>  	my $vollist = PVE::QemuServer::get_vm_volumes($conf);
>> -- 
>> 2.20.1
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> 
>> 
> 




More information about the pve-devel mailing list