[pve-devel] [PATCH v2 container] fix #3424: vzdump: cleanup: wait for active replication
Fabian Ebner
f.ebner at proxmox.com
Wed Jan 26 10:08:06 CET 2022
Am 14.01.22 um 14:08 schrieb Fabian Ebner:
> As replication and backup can happen at the same time, the vzdump
> snapshot might be actively used by replication when backup tries
> to cleanup, resulting in a not (or only partially) removed snapshot
> and locked (snapshot-delete) container.
>
> Wait up to 10 minutes for any ongoing replication. If replication
> doesn't finish in time, the fact that there is no attempt to remove
> the snapshot means that there's no risk for the container to end up in
> a locked state. And the beginning of the next backup will force remove
> the left-over snapshot, which will very likely succeed even at the
> storage layer, because the replication really should be done by then
> (subsequent replications shouldn't matter as they don't need to
> re-transfer the vzdump snapshot).
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> ---
Might not be the best approach as it doesn't cover the same edge case
with manual snapshot removal:
https://bugzilla.proxmox.com/show_bug.cgi?id=3424#c1
>
> Changes from v1:
> * Check if replication is configured first.
> * Use "active replication" in log message.
>
> VM backups are not affected by this, because they don't use
> storage/config snapshots, but use pve-qemu's block layer.
>
> Decided to go for this approach rather than replication waiting on
> backup, because "full backup can take much longer than replication
> usually does", and even if we time out, we can just skip the removal
> for now and have the next backup do it.
>
> src/PVE/VZDump/LXC.pm | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm
> index b7f7463..5bac089 100644
> --- a/src/PVE/VZDump/LXC.pm
> +++ b/src/PVE/VZDump/LXC.pm
> @@ -8,9 +8,11 @@ use File::Path;
> use POSIX qw(strftime);
>
> use PVE::Cluster qw(cfs_read_file);
> +use PVE::GuestHelpers;
> use PVE::INotify;
> use PVE::LXC::Config;
> use PVE::LXC;
> +use PVE::ReplicationConfig;
> use PVE::Storage;
> use PVE::Tools;
> use PVE::VZDump;
> @@ -476,8 +478,21 @@ sub cleanup {
> }
>
> if ($task->{cleanup}->{remove_snapshot}) {
> - $self->loginfo("cleanup temporary 'vzdump' snapshot");
> - PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0);
> + my $do_remove = sub {
> + $self->loginfo("cleanup temporary 'vzdump' snapshot");
> + PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0);
> + };
> +
> + my $repl_conf = PVE::ReplicationConfig->new();
> + eval {
> + if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
> + $self->loginfo("checking/waiting for active replication..");
> + PVE::GuestHelpers::guest_migration_lock($vmid, 600, $do_remove);
> + } else {
> + $do_remove->();
> + }
> + };
> + die "snapshot 'vzdump' was not (fully) removed - $@" if $@;
> }
> }
>
More information about the pve-devel
mailing list