[pve-devel] [PATCH pve-guest-common 1/1] partially fix #2530: snapshots: add pre/post/failed-snapshot hooks
Fabian Grünbichler
f.gruenbichler at proxmox.com
Wed Dec 21 11:44:34 CET 2022
this is v2, right? ;)
On December 12, 2022 2:43 pm, Stefan Hanreich wrote:
> This commit adds hooks to the snapshotting process, which can be used
> to run additional setup scripts to prepare the VM for snapshotting.
>
> Examples for use cases include:
> * forcing processes to flush their writes
> * blocking processes from writing
> * altering the configuration of the VM to make snapshotting possible
>
> The prepare step has been split into two parts, so the configuration
> can be locked a bit earlier during the snapshotting process. Doing it
> this way ensures that the configuration is already locked during the
> pre-snapshot hook. Because of this split, the VM config gets written
> in two stages now, rather than one.
>
> In case of failure during the preparation step - after the lock is
> written - error handling has been added so the lock gets released
> properly. The failed-snapshot hook runs when the snapshot fails, if
> the pre-snapshot hook ran already. This enables users to revert any
> changes done during the pre-snapshot hookscript.
see below
> The preparation step assumes that the hook does not convert the
> current VM into a template, which is why the basic checks are not
> re-run after the pre-snapshot hook. The storage check runs after the
> pre-snapshot hook, because the hook might get used to setup the
> storage for snapshotting. If the hook would run after the storage
> checks, this becomes impossible.
>
> cfs_update() gets called after every invocation of a hookscript, since
> it is impossible to know which changes get made by the hookscript.
> Doing this ensures that we see the updated state of the CFS after the
> hookscript got invoked.
>
> Signed-off-by: Stefan Hanreich <s.hanreich at proxmox.com>
> ---
> src/PVE/AbstractConfig.pm | 49 ++++++++++++++++++++++++++++++++++-----
> 1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
> index a0c0bc6..3bff600 100644
> --- a/src/PVE/AbstractConfig.pm
> +++ b/src/PVE/AbstractConfig.pm
> @@ -710,8 +710,7 @@ sub __snapshot_prepare {
>
> my $snap;
>
> - my $updatefn = sub {
> -
> + my $run_checks = sub {
> my $conf = $class->load_config($vmid);
>
> die "you can't take a snapshot if it's a template\n"
> @@ -721,15 +720,21 @@ sub __snapshot_prepare {
>
> $conf->{lock} = 'snapshot';
>
> - my $snapshots = $conf->{snapshots};
> -
> die "snapshot name '$snapname' already used\n"
> - if defined($snapshots->{$snapname});
> + if defined($conf->{snapshots}->{$snapname});
> +
> + $class->write_config($vmid, $conf);
> + };
>
> + my $updatefn = sub {
> + my $conf = $class->load_config($vmid);
> my $storecfg = PVE::Storage::config();
> +
> die "snapshot feature is not available\n"
> if !$class->has_feature('snapshot', $conf, $storecfg, undef, undef, $snapname eq 'vzdump');
>
> + my $snapshots = $conf->{snapshots};
> +
> for my $snap (sort keys %$snapshots) {
> my $parent_name = $snapshots->{$snap}->{parent} // '';
> if ($snapname eq $parent_name) {
> @@ -753,7 +758,32 @@ sub __snapshot_prepare {
> $class->write_config($vmid, $conf);
> };
>
> - $class->lock_config($vmid, $updatefn);
> + $class->lock_config($vmid, $run_checks);
> +
> + eval {
> + my $conf = $class->load_config($vmid);
> + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot", 1);
> + };
> + my $err = $@;
> +
> + PVE::Cluster::cfs_update();
> +
> + if ($err) {
> + $class->remove_lock($vmid, 'snapshot');
> + die $err;
> + }
> +
I wonder if we don't also want to call the 'failed-snapshot' phase when just the
pre-snapshot invocation failed? might be possible to combine the error handling
then, although I am not sure it makes it more readable if combined..
> +
> + if (my $err = $@) {
> + my $conf = $class->load_config($vmid);
> + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
this exec_hookscript needs to be inside an eval {}, with warn in case it fails..
also, this call here happens when preparing for making the snapshot, after
possibly saving the VM state, but before taking the volume snapshots..
> + PVE::Cluster::cfs_update();
> +
> + $class->remove_lock($vmid, 'snapshot');
> + die $err;
> + }
>
> return $snap;
> }
> @@ -837,11 +867,18 @@ sub snapshot_create {
>
> if ($err) {
> warn "snapshot create failed: starting cleanup\n";
> +
> + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "failed-snapshot");
eval + warn as well
this call here happens when the volume snapshots might or might not have been
created already (depending on what exactly the error cause is).
> + PVE::Cluster::cfs_update();
> +
> eval { $class->snapshot_delete($vmid, $snapname, 1, $drivehash); };
> warn "$@" if $@;
> die "$err\n";
> }
>
> + PVE::GuestHelpers::exec_hookscript($conf, $vmid, "post-snapshot");
and here we have a similar issue (no eval), what should happen if post-snapshot
fails?
A die immediately (very likely wrong, current)
B eval + warn but proceed with commit (possibly leaving leftover hook changes around)
C eval + warn, call failed-snapshot but proceed with commit (gives the
hookscript a chance to cleanup, but how does it differentiate between the
different failed-snapshot call sites?)
D eval + delete snapshot (seems suboptimal)
E eval + call failed-snapshot + delete snapshot (same, and also the issue of the
hookscript being able to know what's going on again)
B and C seem most sensible to me, but C adds to the issue of "missing
failed-snapshot context", depending on what the hookscript is doing..
one way to pass information is via the environment, we do that for the migration
case already (setting PVE_MIGRATED_FROM, so that the pre-start/post-start
hookscript can know the start happens in a migration context, and where to
(possibly) find the guest config..
for example, we could set PVE_SNAPSHOT_PHASE here, and have prepare/commit/post
as sub-phases, or even pass a list of volumes already snapshotted (or created,
in case of vmstate), or ..
obviously setting the environment is only allowed in a forked worker context,
else it would affect the next API endpoint handled by the pveproxy/pvedaemon/..
process, so it might be worth double-checking and cleaning up to avoid
side-effects with replication/migration/.. if we go down that route..
> + PVE::Cluster::cfs_update();
> +
> $class->__snapshot_commit($vmid, $snapname);
> }
>
> --
> 2.30.2
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
More information about the pve-devel
mailing list