[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Alexandre DERUMIER
aderumier at odiso.com
Thu Dec 1 15:43:25 CET 2016
Any news for local disk live migration ?
I have a time to work on it this month if cleanup is needed, but I'll be very busy at the begin of next year.
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 16 Novembre 2016 12:10:05
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Wolfgang, any comment ?
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 10 Novembre 2016 15:47:47
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Well, yes, but what happens when the connection fails or gets interrupted?
>>A vanishing connection (timeout) as well as when the connection gets killed
>>for some reason (eg. tcpkill) need to be handled in the
>>qemu_drive_mirror_monitor() functions properly.
a running mirroring job auto abort if network fail. (or any write/read error on source/destination).
This check was more for the connection.
the drive-mirror hang when trying to establish the connection if the target not responding.
(and I'm not sure about it they are a timeout, as I have wait some minutes without response)
----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 10 Novembre 2016 14:43:15
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderumier at odiso.com> wrote:
>
>
> > + die "can't connect remote nbd server $server:$port" if !PVE::Network::tcp_ping($server, $port, 2);
> >>I'm not all too happy about this check here as it is a TOCTOU race.
> >>(I'm not happy about some the other uses of it as well, but some are
> >>only for status quieries, iow. to display information, where it's fine.)
>
> >>However, in this case, if broken/missing connections can still not be
> >>caught (like in my previous tests), then this only hides 99.99% of the
> >>cases while still wrongly deleting disks in the other 0.01%, which is
> >>unacceptable behavior.
>
> So, do you want that I remove the check ?
Well, yes, but what happens when the connection fails or gets interrupted?
A vanishing connection (timeout) as well as when the connection gets killed
for some reason (eg. tcpkill) need to be handled in the
qemu_drive_mirror_monitor() functions properly.
>
>
> >>$jobs is still empty at this point. The assignment below should be moved
> >>up.
>
> > + die "mirroring error: $err";
> > + }
> > +
> > + $jobs->{"drive-$drive"} = {};
> >>This one ^
>
> Ok.
>
> Thanks for the review!
>
>
>
> ----- Mail original -----
> De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> À: "aderumier" <aderumier at odiso.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Jeudi 10 Novembre 2016 13:21:00
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>
> On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote:
> > we can use multiple drive_mirror in parralel.
> >
> > block-job-complete can be skipped, if we want to add more mirror job later.
> >
> > also add support for nbd uri to qemu_drive_mirror
> >
> > Signed-off-by: Alexandre Derumier <aderumier at odiso.com>
> > ---
> > PVE/QemuServer.pm | 171 +++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 123 insertions(+), 48 deletions(-)
> >
> > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> > index 54edd96..e989670 100644
> > --- a/PVE/QemuServer.pm
> > +++ b/PVE/QemuServer.pm
> > @@ -5824,91 +5824,165 @@ sub qemu_img_format {
> > }
> >
> > sub qemu_drive_mirror {
> > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_;
> > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $skipcomplete) = @_;
> >
> > - my $storecfg = PVE::Storage::config();
> > - my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid);
> > + $jobs = {} if !$jobs;
> > +
> > + my $qemu_target;
> > + my $format;
> >
> > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) {
> > + $qemu_target = $dst_volid;
> > + my $server = $1;
> > + my $port = $2;
> > + $format = "nbd";
> > + die "can't connect remote nbd server $server:$port" if !PVE::Network::tcp_ping($server, $port, 2);
>
> I'm not all too happy about this check here as it is a TOCTOU race.
> (I'm not happy about some the other uses of it as well, but some are
> only for status quieries, iow. to display information, where it's fine.)
>
> However, in this case, if broken/missing connections can still not be
> caught (like in my previous tests), then this only hides 99.99% of the
> cases while still wrongly deleting disks in the other 0.01%, which is
> unacceptable behavior.
>
> > + } else {
> > +
> > + my $storecfg = PVE::Storage::config();
> > + my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid);
> > +
> > + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> >
> > - my $format = qemu_img_format($dst_scfg, $dst_volname);
> > + $format = qemu_img_format($dst_scfg, $dst_volname);
> >
> > - my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> > + my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> >
> > - my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
> > + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
> > + }
> >
> > my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target };
> > $opts->{format} = $format if $format;
> >
> > - print "drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed\n";
> > + print "drive mirror is starting for drive-$drive\n";
> >
> > - my $finish_job = sub {
> > - while (1) {
> > - my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> > - my $stat = @$stats[0];
> > - last if !$stat;
> > - sleep 1;
> > - }
> > - };
> > + eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run for this device,it's throw an error
> > + if (my $err = $@) {
> > + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
>
> $jobs is still empty at this point. The assignment below should be moved
> up.
>
> > + die "mirroring error: $err";
> > + }
> > +
> > + $jobs->{"drive-$drive"} = {};
> This one ^
>
> > +
> > + qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete);
> > +}
> > +
> > +sub qemu_drive_mirror_monitor {
> > + my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_;
> >
> > eval {
> > - vm_mon_cmd($vmid, "drive-mirror", %$opts);
> > +
> > + my $err_complete = 0;
> > +
> > while (1) {
> > + die "storage migration timed out\n" if $err_complete > 300;
> > +
> > my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> > - my $stat = @$stats[0];
> > - die "mirroring job seem to have die. Maybe do you have bad sectors?" if !$stat;
> > - die "error job is not mirroring" if $stat->{type} ne "mirror";
> >
> > - my $busy = $stat->{busy};
> > - my $ready = $stat->{ready};
> > + my $running_mirror_jobs = {};
> > + foreach my $stat (@$stats) {
> > + next if $stat->{type} ne 'mirror';
> > + $running_mirror_jobs->{$stat->{device}} = $stat;
> > + }
> >
> > - if (my $total = $stat->{len}) {
> > - my $transferred = $stat->{offset} || 0;
> > - my $remaining = $total - $transferred;
> > - my $percent = sprintf "%.2f", ($transferred * 100 / $total);
> > + my $readycounter = 0;
> >
> > - print "transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n";
> > - }
> > + foreach my $job (keys %$jobs) {
> > +
> > + if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) {
> > + print "$job : finished\n";
> > + delete $jobs->{$job};
> > + next;
> > + }
> > +
> > + die "$job: mirroring has been cancelled. Maybe do you have bad sectors?" if !defined($running_mirror_jobs->{$job});
> >
> > + my $busy = $running_mirror_jobs->{$job}->{busy};
> > + my $ready = $running_mirror_jobs->{$job}->{ready};
> > + if (my $total = $running_mirror_jobs->{$job}->{len}) {
> > + my $transferred = $running_mirror_jobs->{$job}->{offset} || 0;
> > + my $remaining = $total - $transferred;
> > + my $percent = sprintf "%.2f", ($transferred * 100 / $total);
> >
> > - if ($stat->{ready} eq 'true') {
> > + print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n";
> > + }
> >
> > - last if $vmiddst != $vmid;
> > + $readycounter++ if $running_mirror_jobs->{$job}->{ready} eq 'true';
> > + }
> >
> > - # try to switch the disk if source and destination are on the same guest
> > - eval { vm_mon_cmd($vmid, "block-job-complete", device => "drive-$drive") };
> > - if (!$@) {
> > - &$finish_job();
> > + last if scalar(keys %$jobs) == 0;
> > +
> > + if ($readycounter == scalar(keys %$jobs)) {
> > + print "all mirroring jobs are ready \n";
> > + last if $skipcomplete; #do the complete later
> > +
> > + if ($vmiddst && $vmiddst != $vmid) {
> > + # if we clone a disk for a new target vm, we don't switch the disk
> > + PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs);
> > last;
> > + } else {
> > +
> > + foreach my $job (keys %$jobs) {
> > + # try to switch the disk if source and destination are on the same guest
> > + print "$job : Try to complete block job\n";
> > +
> > + eval { vm_mon_cmd($vmid, "block-job-complete", device => $job) };
> > + if ($@ =~ m/cannot be completed/) {
> > + print "$job : block job cannot be complete. Try again \n";
> > + $err_complete++;
> > + }else {
> > + print "$job : complete ok : flushing pending writes\n";
> > + $jobs->{$job}->{complete} = 1;
> > + }
> > + }
> > }
> > - die $@ if $@ !~ m/cannot be completed/;
> > }
> > sleep 1;
> > }
> > -
> > -
> > };
> > my $err = $@;
> >
> > - my $cancel_job = sub {
> > - vm_mon_cmd($vmid, "block-job-cancel", device => "drive-$drive");
> > - &$finish_job();
> > - };
> > -
> > if ($err) {
> > - eval { &$cancel_job(); };
> > + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
> > die "mirroring error: $err";
> > }
> >
> > - if ($vmiddst != $vmid) {
> > - # if we clone a disk for a new target vm, we don't switch the disk
> > - &$cancel_job(); # so we call block-job-cancel
> > +}
> > +
> > +sub qemu_blockjobs_cancel {
> > + my ($vmid, $jobs) = @_;
> > +
> > + foreach my $job (keys %$jobs) {
> > + print "$job: try to cancel block job\n";
> > + eval { vm_mon_cmd($vmid, "block-job-cancel", device => $job); };
> > + $jobs->{$job}->{cancel} = 1;
> > + }
> > +
> > + while (1) {
> > + my $stats = vm_mon_cmd($vmid, "query-block-jobs");
> > +
> > + my $running_jobs = {};
> > + foreach my $stat (@$stats) {
> > + $running_jobs->{$stat->{device}} = $stat;
> > + }
> > +
> > + foreach my $job (keys %$jobs) {
> > +
> > + if(defined($jobs->{$job}->{cancel}) && !defined($running_jobs->{$job})) {
> > + print "$job : finished\n";
> > + delete $jobs->{$job};
> > + }
> > + }
> > +
> > + last if scalar(keys %$jobs) == 0;
> > +
> > + sleep 1;
> > }
> > }
> >
> > sub clone_disk {
> > my ($storecfg, $vmid, $running, $drivename, $drive, $snapname,
> > - $newvmid, $storage, $format, $full, $newvollist) = @_;
> > + $newvmid, $storage, $format, $full, $newvollist, $jobs, $skipcomplete) = @_;
> >
> > my $newvolid;
> >
> > @@ -5917,6 +5991,7 @@ sub clone_disk {
> > $newvolid = PVE::Storage::vdisk_clone($storecfg, $drive->{file}, $newvmid, $snapname);
> > push @$newvollist, $newvolid;
> > } else {
> > +
> > my ($storeid, $volname) = PVE::Storage::parse_volume_id($drive->{file});
> > $storeid = $storage if $storage;
> >
> > @@ -5949,7 +6024,7 @@ sub clone_disk {
> > if $drive->{iothread};
> > }
> >
> > - qemu_drive_mirror($vmid, $drivename, $newvolid, $newvmid, $sparseinit);
> > + qemu_drive_mirror($vmid, $drivename, $newvolid, $newvmid, $sparseinit, $jobs, $skipcomplete);
> > }
> > }
> >
> > --
> > 2.1.4
>
>
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list