[pve-devel] [PATCH container] fix #3424: vzdump: cleanup: wait for active replication
Fabian Ebner
f.ebner at proxmox.com
Fri Jan 14 13:39:36 CET 2022
Am 14.01.22 um 13:21 schrieb Thomas Lamprecht:
> On 14.01.22 12:55, Fabian Ebner wrote:
>> As replication and backup can happen at the same time, the vzdump
>> snapshot might be actively used by replication when backup tries
>> to cleanup, resulting in a not (or only partially) removed snapshot
>> and locked (snapshot-delete) container.
>>
>> Wait up to 10 minutes for any ongoing replication. If replication
>> doesn't finish in time, the fact that there is no attempt to remove
>> the snapshot means that there's no risk for the container to end up in
>> a locked state. And the beginning of the next backup will force remove
>> the left-over snapshot, which will very likely succeed even at the
>> storage layer, because the replication really should be done by then
>> (subsequent replications shouldn't matter as they don't need to
>> re-transfer the vzdump snapshot).
>>
>> Suggested-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
>> ---
>>
>> VM backups are not affected by this, because they don't use
>> storage/config snapshots, but use pve-qemu's block layer.
>>
>> Decided to go for this approach rather than replication waiting on
>> backup, because "full backup can take much longer than replication
>> usually does", and even if we time out, we can just skip the removal
>> for now and have the next backup do it.
>>
>> src/PVE/VZDump/LXC.pm | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm
>> index b7f7463..10edae9 100644
>> --- a/src/PVE/VZDump/LXC.pm
>> +++ b/src/PVE/VZDump/LXC.pm
>> @@ -8,6 +8,7 @@ use File::Path;
>> use POSIX qw(strftime);
>>
>> use PVE::Cluster qw(cfs_read_file);
>> +use PVE::GuestHelpers;
>> use PVE::INotify;
>> use PVE::LXC::Config;
>> use PVE::LXC;
>> @@ -476,8 +477,14 @@ sub cleanup {
>> }
>>
>> if ($task->{cleanup}->{remove_snapshot}) {
>> - $self->loginfo("cleanup temporary 'vzdump' snapshot");
>> - PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0);
>> + $self->loginfo("checking/waiting for replication..");
>
> do we know if replication is setup at this stage? as I'd like to avoid
> logging that if that's not the case to avoid user confusion.
>
No, but I can add a check for it in v2.
>> + eval {
>> + PVE::GuestHelpers::guest_migration_lock($vmid, 600, sub {
>> + $self->loginfo("cleanup temporary 'vzdump' snapshot");
>> + PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0);
>> + });
>> + };
>> + die "snapshot 'vzdump' was not (fully) removed - $@" if $@;
>> }
>> }
>>
>
More information about the pve-devel
mailing list