[pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug
Wolfgang Link
w.link at proxmox.com
Thu Apr 12 11:33:52 CEST 2018
> Dietmar Maurer <dietmar at proxmox.com> hat am 12. April 2018 um 11:06 geschrieben:
>
>
> > diff --git a/PVE/Replication.pm b/PVE/Replication.pm
> > index 9bc4e61..d8ccfaf 100644
> > --- a/PVE/Replication.pm
> > +++ b/PVE/Replication.pm
> > @@ -136,8 +136,18 @@ sub prepare {
> > $last_snapshots->{$volid}->{$snap} = 1;
> > } elsif ($snap =~ m/^\Q$prefix\E/) {
> > $logfunc->("delete stale replication snapshot '$snap' on $volid");
> > - PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snap);
> > - $cleaned_replicated_volumes->{$volid} = 1;
> > +
> > + eval {
> > + PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snap);
> > + $cleaned_replicated_volumes->{$volid} = 1;
> > + };
> > +
> > + # If deleting the snapshot fails, we can not be sure if it was due to an
> > error or a timeout.
> > + # The likelihood that the delete has worked out is high at a timeout.
> > + # If it really fails, it will try to remove on the next run.
> > + warn $@ if $@;
> > +
> > + $logfunc->("delete stale replication snapshot error: $@") if $@;
>
> why do we need this in prepare?
Because we have here the same Problem.
If the ZFS pool is under load the snapshot delete will run in a timeout.
More information about the pve-devel
mailing list